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ABSTRACT 

This  report  examines  data  link  requirements  for  a  portable  unmanned  aerial  vehicle. 
Crucial  to  the  operation  of  such  a  data  link  is  the  development  of  suitable  computer 
algorithms  that  are  capable  of  significantly  compressing  and  reconstructing  image  data 
in  a  timely  manner  for  viewing  at  a  remote  station.  As  a  consequence  of  the  near  real¬ 
time  requirement  we  investigate  recent  advances  in  lossy  data  compression 
techruques  concentrating  on  transform  coding  techniques  involving  the  discrete  cosine 
transform,  fractals  and  wavelets.  At  present  the  discrete  cosine  transform  is  available 
on  a  microprocessor  chip  and  can  offer  acceptable  reconstructed  images  close  to  real¬ 
time  with  compression  ratios  of  up  to  35:1,  but  other  techniques  promise  even  higher 
compression  ratios  and  possibly  a  near  real-time  capability  in  the  not  too  distant 
future. 
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Executive  Summary 


In  this  report  we  review  the  current  state  of  the  art  regarding  three  of  the  most 
prominent  image  compression  techniques,  namely  discrete  cosine,  fractal  and  wavelet 
transform  coding  techniques.  Our  interest  in  these  techniques  arises  out  of  a  desire  to 
achieve  as  much  compression  as  possible  in  the  real  time  transmission  of  image  data 
from  the  sensor  payload  onboard  a  portable  Unmanned  Aerial  Vehicle  (p-UAV)  to  a 
grotmd  control  station  similar  in  form  to  a  laptop  computer.  In  particular,  we  find  that 
compression  ratios  of  greater  tiian  30:1  are  required  in  order  to  receive  VGA  images  of 
640X400  resolution  and  24  bit  colour  at  a  TV  frame  rate  of  25  Hz.  This  would  be 
suitable  for  Line  Of  Sight  (LOS)  naval  surveillance  operations.  Even  greater 
compression  ratios  are  required  to  receive  images  of  320X200  resolution  and  8  bit  grey 
scale  at  a  frame  rate  of  1  Hz  which  would  allow  for  transmission  along  a  High 
Frequency  (HF)  data  link  in  land  based  reconnaissance  operations.  An  HF  data  link 
removes  the  need  for  LOS  transmission,  although  it  may  be  beyond  tiie  present 
capabilities  of  aU  three  compression  techniques  to  produce  images  of  acceptable 
quality.  For  LOS  transmission  in  the  Very  and  Ultra  High  Frequency  bands  image 
compression  would  still  be  required,  particularly  if  transmission  were  to  occur  at  the 
TV  frame  rate.  Even  for  lower  frame  rates  image  compression  is  desirable  either  to 
reduce  the  demand  on  the  limited  power  source  of  tiie  p-UAV,  extend  the  vehicle's 
range  of  operation  or  make  transmission  more  jam  resistant. 

Of  the  three  lossy  techniques,  the  most  popular  is  the  Discrete  Cosine  Transform 
(DCT)  technique,  which  is  based  on  discrete  Fourier  transform  theory.  Although  still 
undergoing  further  development  as  described  in  this  report,  the  DCT  technique  has 
already  met  the  standard  for  image  compression  put  forward  by  the  Joint 
Photographic  Experts  Group  (JPEG).  The  DCT  is  limited  in  the  amoimt  of  compression 
that  can  be  achieved  without  serious  degradation  of  the  data  resulting  in  block 
artefacts  appearing  on  reconstructed  images.  Thus  we  consider  the  non-conventional 
techruques  of  fractal  and  wavelet  transform  coding,  which  promise  even  higher 
compression  ratios  than  the  DCT. 

Fractal  transform  coding  relies  on  the  fact  that  many  real  world  objects  possess  local 
self-similarity  and  can  be  described  in  terms  of  fractal  transformations.  These  can  be 
transmitted  ^ong  a  communications  channel  using  less  bandwidth  than  the  pixel  data 
of  the  original  digital  image.  Fractal  images  not  only  provide  a  resolution  independent 
image  of  the  original,  but  can  also  yield  very  high  compression  ratios.  However,  at 
present  it  is  questionable  whether  fractal  coding  is  feasible  for  real-time  applications, 
this  report  covers  recent  research  directed  at  this  question.  In  particular,  we  discuss 
recent  attempts  aimed  at  reducing  tiie  time  expended  in  searching  the  domain  blocks 
for  each  range  block  of  the  original  image  during  encoding.  We  also  describe  the 


Accurate  Fractal  Rendering  Algorithm  which  enables  the  fast  decoding  of  video 
streams.  These  developments  offer  real  hope  that  a  fractal  encoding/  decoding  system 
will  be  available  for  near  real-time  applications  in  the  not  too  distant  future. 

Wavelets  can  be  viewed  as  bumps  that  can  be  squeezed  or  expanded  by  dilation  and 
shifted  by  translation.  An  arbitrary  function  can  be  decomposed  into  a  series  of 
wavelets  forming  a  complete  orthonormal  set,  the  underlying  principle  behind  wavelet 
transform  coding.  Wavelet  coding  has  attracted  much  interest  over  the  past  few  years, 
mainly  because  it  can  bring  about  a  reduction  in  the  block  artefacts  associated  with  the 
DCT.  Thus,  it  promises  better  quality  reconstructed  images  at  higher  compression 
ratios  than  the  DCT.  At  present  it  is  tmable  to  match  the  real-time  performance  of  the 
DCT  and  may  never  reach  those  of  fractal  transform  coding.  With  further  advances  in 
microprocessor  technology  and  in  optimising  the  software  approaches  described  in 
this  report,  there  is  more  than  a  possibility  that  this  technique  can  be  applied  to  near 
real-time  applications  soon. 

The  value  to  Defence  of  this  work  is  a  greater  understanding  of  the  current  state  of 
lossy  image  compression  techniques  for  possible  implementation  in  communication 
systems  where  large  amoimts  of  data  are  required  to  be  transmitted  over  narrow 
bandwidths. 
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1.  Introduction 


In  a  previous  report  [1]  Cameron  and  Kowalenko,  hereafter  referred  to  as  CK, 
discussed  the  feasibility  of  a  portable  Unmanned  Aerial  Vehicle  (p-UAV)  for 
deployment  in  various  close  range  reconnaissance  and  surveiUance  missions  currently 
being  conducted  by  the  Australian  Defence  Forces  (ADF).  It  was  expected  that  such  a 
system  would  provide  a  capability  estimated  to  be  50  to  60%  of  the  performance  of 
much  larger  and  significantly  more  expensive  systems.  Also  discussed  m  the  report 
was  the  need  to  employ  data  compression  techniques  when  considering  ihe 
transmission  of  realtime  image  data  from  the  vehicle  to  the  Ground  Control  Station 
(GCS).  They  pointed  out  that  in  order  to  transmit  realtime  TV  pictures  a  large 
band^dth  was  required,  which  meant  that  transmission  could  only  occur  in  the  Very 
High  Frequency  (VHF)  and  Ultra  High  Frequency  (UHF)  bands.  Hence,  the  range  of 
the  p-UAV  was  restricted  to  Line  Of  Sight  (LOS)  operations. 

In  CK  it  was  stated  that  a  combination  of  reducing  the  frame  rate  and  compressing  the 
transmitted  data  would  bring  about  a  decrease  in  the  bandwidth,  thereby  reducing  the 
power  requirements  of  the  system  substantially.  Thus,  it  was  proposed  that  in  the 
short  term  the  data  link  for  the  p-UAV  could  employ  the  conventional  transform 
coding  techniques  to  compress  data  in  the  VHF/UHF  bands,  which  in  turn  would 
provide  the  p-UAV  with  an  operational  range  of  about  30  km  using  a  directional 
antenna.  A  longer  term  goal  might  be  to  employ  more  novel  data  compression 
techniques  offering  even  higher  compression  ratios  such  as  fractal  and  wavelet 
transform  coding  combined  with  the  reduced  frame  rate.  This  could  either: 


(a)  extend  the  range  of  the  p-UAV; 

(b)  remove  LOS  limitations  by  operating  in  the  lower  frequency  HF  band; 

(c)  reduce  directional  antenna  requirements; 

(d)  improve  jam  resistance. 

In  this  report  we  aim  to  investigate  the  current  state  of  the  art  with  regard  to  lossy  data 
compression  techniques  being  employed  in  die  transmission  of  image  data  over  high 
frequency  channels.  Although  many  transform  coding  techniques  exist  [2],  we  shall  be 
concerned  primarily  widi  the  Discrete  Cosine  Transform  (DCT)  technique,  which  has 
become  the  standard  for  the  Joint  Photographic  Experts  Group,  more  commonly 
known  as  JPEG,  and  die  non-conventional  fractal  and  wavelet  transform  techniques. 
Although  the  latter  two  promise  higher  compression  ratios  without  as  much  visible 
degradation  than  the  former  technique  in  specific  applications,  they  are  still  evolving 
and  as  a  consequence,  they  have  not  as  yet  replaced  the  DCT  as  the  principal  image 
compression  technique.  In  particular,  as  we  shall  see,  it  is  only  due  to  advances  in  the 
last  few  years  that  fractal  transform  coding  has  been  able  to  offer  the  possibility  of  a 
near  realtime  capability,  which  is  so  crucial  in  receiving  image  data  from  a  UAV. 
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Realtime  fractal  transform  coding  is  currently  receiving  much  attention  and  we  aim  to 
discuss  these  developments  in  the  present  report. 

The  contents  of  this  report  are  arranged  as  follows.  Section  2  contains  a  summary  of 
the  basic  principles  of  data  transmission,  which  are  necessary  for  understanding  why 
transmitted  images  need  to  be  compressed.  The  next  section  discusses  the  basic  theory 
of  electromagnetic  propagation  required  selecting  an  appropriate  carrier  frequency  for 
the  transmission  of  data  between  the  p-UAV  and  the  GCS.  In  Section  4  we  present 
basic  information  theory  that  not  only  clarifies  the  need  for  employing  data 
compression  techniques,  but  can  also  be  used  to  evaluate  them.  Section  5  contains  a 
description  of  the  DCT  both  from  a  theoretical  and  practical  point  of  view.  In  Secs.  6 
and  7  we  describe  the  current  state  of  the  art  regarding  the  non-conventional 
techniques  of  fractal  and  wavelet  transform  coding.  In  disaissing  the  three  data 
compression  techniques  we  relegate  the  mathematics  to  three  separate  appendices. 
Section  8  concludes  with  an  evaluation  of  the  techniques  in  regard  to  the  p-UAV. 


2.  Modulation  Techniques 

Data  links  are  not  only  required  in  all  UAV  systems  for  the  transmission  of  realtime 
data  but  also  for  the  navigation  and  control  of  the  vehicle.  Data  link  requirements 
which  may  include  the  need  for  data  compression  are  determined  primarily  by  the  rate 
of  data  transmission  between  the  vehicle  and  the  GCS.  Specifically,  an  uplink  is 
required  for  manoeuvring  the  vehicle  via  the  GCS  whereas  two  downlinks  are 
required  respectively  for  monitoring  the  vehicle's  position  and  for  the  transmission  of 
image  data  collected  by  the  sensor  payload  on  the  vehicle.  The  second  downlink  is 
referred  to  as  a  wideband  downlink  since  the  transmission  of  image  data  requires  a 
much  larger  bandwidth  dian  that  required  for  the  narrow  links  connected  with 
navigation  and  control  of  the  vehicle.  T5q)ical  transmission  rates  for  uplink  control 
signals  are  less  than  10  kHz  while  those  for  the  transmission  of  sensor  payload  data, 
may  require  a  transmission  rate  greater  than  10  MHz  depending  on  mission 
requirements  for  the  UAV. 

In  a  telecommunication  system  the  first  requirement  is  that  the  original  information 
energy  is  converted  or  modulated  into  electronic  signals  [3].  These  signals  may  then 
require  amplification  to  increase  the  power  levels  before  they  are  transmitted  to  a 
receiver  at  the  destination.  On  reception  the  signals  may  be  amplified  again  before 
being  converted  or  demodulated  into  recognisable  repHcas  of  the  original  information. 
Thus,  a  complete  system  [4]  consists  of; 
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(a)  a  transmitter,  (which  includes  the  source  of  the  original  information), 

(b)  the  transmission  medium,  and 

(c)  a  receiver. 
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In  order  to  transmit  data  efficiently,  data  transmission  rates  or  more  specifically,  the 
sinusoidal  electromagnetic  signals,  known  as  carrier  signals,  are  modulated  by 
superimposing  on  them  mformation  signals.  Common  forms  of  modulation  are, 

(a)  Frequency  Modulation  (FM), 

(b)  Amplitude  Modulation  (AM), 

(c)  Phase  Shift  Key  (PSK), 

(d)  Frequency  Shift  Key  (FSK), 

(e)  pulse  modulation  which  includes  amongst  others  Pulse  Amplitude  Modulation 
(PAM),  and 

(f)  Pulse  Code  Modulation  (PCM). 

In  addition,  variants  of  these  modulation  methods  exist,  for  example.  Differential  PCM 
(DPCM),  Quadrature  or  Quaternary  PSK  (QPSK)  and  Differentially-encoded  QPSK 
(DQPSK  or  4-phase  DPSK).  The  last  technique  is  efficient  whilst  allowing  reliable 
reception  with  a  simple  demodulator  [5]. 

FM  and  AM  are  essentially  continuous  wave  modulation  techniques,  which  involve 
analogue  information  sigrials  with  the  former  conserving  transmitter  power  better 
than  the  latter  [6].  However,  here  we  concentrate  on  the  transmission  of  digital  signals 
rather  than  analogue  signals  since  there  are  definite  advantages  in  adopting  a  digital 
mode  of  transmission.  In  particular,  digital  signals  are  robust  and  can  be  easily 
processed  [7]  and  because  of  their  regenerative  property,  they  can  be  transmitted  over 
long  distances  through  multiple  switching  centres  and  relay  links  with  little  noise 
interference  or  impairment.  In  addition,  they  are  easily  multiplexed,  switched  or 
recorded.  Thus,  a  digital  data  link  will  be  less  affected  by  noise,  which,  in  turn,  means 
diat  less  power  is  required  to  transmit  signals  with  the  same  bandwidth.  Furthermore, 
since  CK  proposed  that  the  sensor  payload  consist  of  a  CCD  camera  combined  with  a 
second  or  third  generation  image  intensifier  in  Ref.  [1],  compatibility  with  such  a 
sensor  payload  is  easier  to  achieve  by  employing  a  digital  transmission  system  rather 
than  an  analogue  transmission  system 

The  remaining  modulation  techniques  given  in  the  above  list  operate  on  digital  or 
binary  data,  although  pulse  modulation  (PM)  systems  require  some  form  of  c-w 
modulation  in  transmitting  data  [4].  In  a  PCM  system  signal  information  is  transmitted 
in  digital  form  by  sampling  an  analogue  signal  at  regular  intervals  to  produce  a  pi^e 
amplitude  modulated  signal.  Therefore,  a  PCM  system  can  utilise  solid  state  digital 
components  [8].  Consequently,  PCM  tends  to  be  favoured  in  applications  where  mass, 
cost  and  power  consumption  need  to  be  minimised. 
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Before  proceeding  further  with  our  evaluation  of  modulation  techniques,  we  need  to 
consider  the  display  of  images  at  the  GCS.  Although  capable  of  offering  higher 
resolutions  at  cheaper  prices  and  of  producing  images  with  significantly  less  glare. 
Cathode  Ray  Tube  (CRT)  screens  are  more  bulky,  operate  at  higher  voltages,  consume 
more  power  and  are  less  robust  than  Liquid  Crystal  Displays  (LCDs).  For  these  reasons 
LCDs  are  used  in  laptop  computers.  They  also  come  in  different  dimensions  with  a 
variety  of  resolutions,  the  most  common  being  320X200,  640X400  and  640X480.  In 
order  to  display  images  with  adequate  detail,  more  levels  of  grey  (or  colour)  are 
required  for  the  low  resolution  screens  than  for  the  higher  resolution  screens.  Thus  the 
munber  of  colomrs  required  to  obtain  adequate  imagery  for  a  resolution  of  320X200 
would  be  at  least  256  whereas  for  a  resolution  of  640X480  perhaps  only  two  colours 
(black  and  white)  are  required  since  this  resolution  is  superior  to  that  for  newspapers. 
A  more  detailed  discussion  of  the  relative  merits  of  both  CRTs  and  LCDs  is  given  in 
Ref.  [9]. 

FSK  and  PSK  are  commonly  used  in  the  transmission  of  data  along  computer  and 
printer  communication  lines  or  cables.  Since  our  aim  is  to  produce  digitised  images  on 
a  laptop  LCD  (one  of  the  requirements  for  the  p-UAV  presented  in  Ref.  [1]),  variants  of 
FSK  and  PSK  become  the  preferred  modulation  techniques.  For  covert  transmission 
the  preferred  modulation  technique  is  a  PSK  variant  because  it  provides  a  constant 
amplitude,  thereby  permitting  the  power  density  to  be  spread  evenly  over  the  entire 
extended  electromagnetic  spectrum  [10].  One  disadvantage  of  coherent  PSK  systems, 
however,  is  that  the  receiver  requires  a  good  phase  reference,  which  is  difficult  to 
achieve  in  practice.  Thus  the  signal  is  degraded  by  extracting  a  phase  reference  from 
the  transmitted  signal. 

The  frequency  spectrum  of  a  modulated  signal  is  usually  symmetric  about  the  carrier 
frequency  except  for  SSB  transmissions.  The  combined  parts  of  the  spectrum  above 
and  below  the  carrier  frequency  form  the  bandwidth  of  the  signal.  For  the  case  where 
sensor  data  are  transmitted  in  TV  format  the  bandwidth  depends  on  the  nximber  of 
picture  elements  per  frame  and  the  rate  of  transmission  [8].  One  important  aspect  in 
transmitting  large  amounts  of  data  over  a  single  communication  link  is  the  provision 
of  pre-allocated  channels  that  are  capable  of  passing  high  enough  frequencies,  i.e. 
chaimels  of  large  enough  bandwidth.  Commercial  TV  channels  require  about  6  MHz, 
but  the  bandwidth  required  for  the  transmission  of  a  signal  is  dependent  on  the  t3rpe  of 
modulation  employed  by  the  telecommunication  system. 

As  the  modulated  signals  traverse  the  transmission  medium  or  communication 
channel,  they  always  become  distorted  due  to  additive  noise  (contamination  by 
tmwanted  signals)  and  interference,  which,  in  txum,  place  limitatiojis  on  the 
transmission  of  data.  At  tihe  receiver  the  best  possible  rephca  of  the  original 
information  signal  or  message  is  obtained  by  demodulation,  i.e.  the  removal  of  the 
carrier  signal  from  the  modulated  signal  and  then  filtering  to  remove  noise. 
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3.  Propagation 

Because  received  messages  are  not  perfect  replicas  of  the  original  message,  there  is 
always  a  certain  amoxmt  of  xmcertainty  involved  in  decoding  the  received  data.  This, 
however,  can  be  reduced  by  increasing  the  power  of  the  original  signal.  Thus  tiie  ratio 
of  the  average  signal  power  to  the  noise  power  (SNR)  is  an  indication  of  the 
uncertainty  or  error  in  the  received  data.  An  increase  in  bandwidth  allows  a  signal  to 
be  converted  into  a  form  that  makes  it  more  immrme  to  noise.  To  transmit  a  given 
amoimt  of  data,  eidier  the  signal  power  can  be  increased  and  the  bandwidth  reduced 
or  the  signal  power  can  be  reduced  and  the  bandwidth  increased. 

The  energy  radiated  by  a  transmitter  can  reach  the  receiving  station  or  GCS  by  using 
one  or  more  of  the  following  modes  [3,11]: 

(a)  surface  or  grotmd  waves,  which  propagate  by  following  the  earth's  curvature  to 
distances  of  15  to  110  km  depending  on  frequency.  An  appreciable  amount  of  the 
energy  of  electromagnetic  waves  is  dissipated  in  Ihis  mode.  These  waves  have 
frequency  bands  of  below  300  Hz,  300  Hz-3  kHz,  3-30  kHz,  30-300  kHz  and  300 
kHz-3  MHz  and  3-30  MHz,  which  correspond  to  Extremely  Low  Frequency  (ELF), 
Infra  Low  (ILF),  Very  Low  Frequency  (VLF),  Low  Frequency  (LF),  Meditun 
Frequency  (MF)  and  High  Frequency  (HF)  waves. 

(b)  sky  waves,  which  are  used  for  HF  radio  communications  systems  including  long 
distance  radio  telephony  and  soxmd  broadcasting.  HF  sky  waves  can  be  used  on 
different  portions  of  the  frequency  range  3-30  MHz  at  different  times  of  the  day. 
Sky  waves  are  directed  into  the  ionosphere  and  imder  certain  conditions  can  be 
reflected  to  the  required  destination.  The  maximum  and  minimum  useable 
frequencies  for  transmission  change  during  tiie  day.  Hence,  an  operating 
frequency  must  be  chosen  from  this  range,  which  places  a  further  restriction  on 
tire  bandwidth  for  the  transmission  of  signals  in  the  HF  band.  For  example,  at  1400 
hr  the  difference  between  the  two  frequencies  is  about  16  MHz  whereas  at  about 
nudnight  the  difference  is  about  2  MHz.  In  addition,  interference  from  other  users 
and  atmospheric  noise  contribute  even  further  to  channel  limitations.  For  more 
detailed  information  about  the  propagation  characteristics  of  HF  waves,  see  Ref. 
[12]. 

(c)  space  or  LOS  waves,  which  are  utilised  for  both  sound  and  television  broadcasting 
and  operate  in  the  VHF  (Very  High  Frequency),  UHF  (Ultra  High  Frequency)  and 
SHF  (Super  High  Frequency)  bands  of  30-300  MHz,  300  MHz-3  GHz  and  3-30  Ghz 
respectively.  These  waves  are  dependent  upon  the  distance  to  the  horizon. 

(d)  via  satellite  systems  such  as  tiie  GPS  system  described  in  the  previous  section. 
Satellite  systems  can  be  employed  to  carry  multi-channel  telephony  systems,  TV 
signals  and  data  in  the  UHF  and  SHF  bands. 
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(e)  scatter  systems,  which  operate  in  the  UHF  and  SHF  bands.  These  are  employed  in 
mtilti-channel  telephony  links. 

For  a  discussion  of  electromagnetic  propagation  problems  in  tactical  environments 
such  as  battlefields  and  aircraft  systems  the  reader  is  referred  to  Ref.  [13]. 

In  data  transmission,  a  binary  pulse  is  a  pulse  with  one  of  only  two  possible 
amplitudes  or  states  whilst  a  binary  message  or  signal  is  a  sequence  of  binary  pulses 
occurring  at  regularly  spaced  intervals  of  say  1/R  sec  or  at  a  rate  of  R/sec  [4].  A  bit  is 
defined  as  the  maximum  amoxmt  of  information  that  can  be  transmitted  in  a  single 
binary  pulse  [14].  The  uplink  radio  control  for  a  UAV  requires  a  mean  data  rate  of  up 
to  1  kbit/s  and  that  a  si^ar  transmission  rate  for  the  telemetry  data  on  the  downlink 
is  required  for  the  necessary  flight  and  management  functions  [6]. 

All  telecommrmication  systems  are  capable  of  transmitting  a  maximtim  number  of  bits 
per  second  without  loss  in  a  channel.  This  is  known  as  the  channel  capacity  and  is  the 
central  concept  of  data  communication  [15].  In  addition  to  bandwidth,  the  channel 
capacity  is  Hrnited  by  the  Signal  to  Noise  Ratio  (SNR)  for  the  system  [14].  If  a  channel 
were  free  from  noise,  then  the  channel  capacity  would  be  infinite  but  since  there  is 
always  some  noise  in  the  process  of  transmission,  the  channel  capacity  is  always  finite. 
For  SNRs  greater  than  1  the  channel  capacity  exceeds  the  bandwidth  asstiming  that  the 
noise  is  Gaussian  and  white.  To  cite  interesting  examples  of  channel  capacity,  the 
maximum  downlink  rate  for  the  Space  Shuttle  is  48  Mbit/s  while  that  for  the  proposed 
Space  Station  Freedom  is  75  Mbit/s  [2]. 

Current  UAV  transmission  frequencies  are  restricted  to  an  upper  limit  of  15  GHz 
whereas  the  lowest  frequency  when  using  an  omnidirectional  antenna  is  2  MHz  [10]. 
In  a  hostile  ECM  environment  the  effective  jamming  zone  for  a  UAV  can  be  reduced 
significantly,  if  the  UAV  is  equipped  with  a  directional  antenna,  but  this  may  require  a 
higher  transmission  frequency.  There  are  several  factors  in  selecting  an  appropriate 
carrier  frequency  for  a  data  link,  which,  in  turn,  influence  the  cost,  mass  and  power 
consumption  of  a  UAV  [8].  The  minimum  detectable  signal  power  at  a  receiver  is 
inversely  proportional  to  the  square  of  the  frequency  and  hence,  a  lower  frequency  is 
more  desirable.  However,  for  a  fixed  antenna  size,  increasing  the  frequency  results  in 
increased  gain,  although  the  beamwidth  is  narrowed.  Lower  frequencies  are  also 
preferred  to  minimise  the  cost  and  mass  of  link  components  and  atmospheric  loss 
increases  with  increasing  frequency.  For  naval  applications  vertically  polarised 
radiation  is  preferred  since  multipath  effects  are  greater  for  horizontally  polarised 
radiation.  Different  carrier  frequencies  separated  by  several  signal  bandwidths  should 
be  used  for  both  the  uplink  and  downlink. 

Although  an  HF  data  link  provides  much  greater  flexibility  in  that  there  is  no  need  to 
maintain  a  line  of  sight  with  the  vehicle,  there  are  two  major  problems  in  creating  such 
a  data  link  for  the  PUMA.  The  first  problem  is  the  limited  allocated  bandwidths  or 
channels  for  the  transmission  of  data  in  the  HF  spectrum.  These  HF  channels,  which 
are  often  referred  to  as  voice  channels,  are  3  kHz  wide.  By  using  higher  powered 
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transmitters  to  provide  SNRs  greater  than  3  dB,  the  transmission  rate  can  be  extended 
to  4.8  kbits/ s.  With  the  limited  power  available  on  the  p-UAV,  however,  it  is  most 
likely  that  the  channel  capacity  will  be  limited  to  2.4  kbits/s  (a  quarter  of  the  standard 
transmission  rate  for  computer  lines) .  For  an  HF  data  link  to  be  viable,  at  a  reduced 
frame  rate,  a  transmission  rate  in  the  vicinity  of  10  kbits/  s  is  needed. 

Single  Side  Band  (SSB)  transmission  systems  can  use  up  to  4  voice  channels  (12  kHz 
bandwidth),  two  above  the  carrier  frequency  and  two  below  [12].  Therefore,  it  is 
feasible  to  consider  wideband  HF  data  links,  although  in  practice,  problems  may  be 
encountered  in  obtaining  approval  to  use  more  tiian  one  voice  channel.  However,  as 
die  p-UAV  is  Hkely  to  be  deployed  in  missions  conducted  in  remote  areas  over 
distances  up  to  30  km  from  the  GCS  [1],  this  approval  may  not  be  so  difficult  to  obtain. 

The  second  problem  in  using  HF  waves  is  concerned  with  their  propagation 
characteristics.  Sky  waves  are  particularly  useful  as  a  means  of  transmitting  signals 
over  distances  greater  than  150  km,  and  may  achieve  ranges  of  over  3,200  km, 
although  predicting  the  reflection  off  the  ionosphere  and  obtaining  consistent  long 
range  communication  can  be  difficult.  For  short  distances  up  to  80  km  the  groxmdwave 
mode  is  the  appropriate  form  of  HF  communication,  although  certain  conditions  such 
as  manpack  radio  operations  in  dense  wet  terrain  can  limit  the  usefulness  of  this  mode 
to  only  a  few  km  [16].  Furthermore,  the  gap  beginning  where  the  groundwave 
becomes  too  weak  for  commrmication  and  ending  where  the  sky  wave  returns  to  earth 
has  been  considered  as  a  region  where  HF  communication  is  ineffective  and  is  referred 
to  as  the  skip  zone,  which  in  dense  moxmtainous  terrain  can  range  from  4  to  150  km 
^d  hehce,  includes  the  operating  radius  of  the  p-UAV. 

The  problem  described  in  the  previous  paragraph  can  be  overcome  by  directing  an  HF 
skywave  signal  within  a  narrow  band  of  frequencies  at  the  zenith  and  then  receiving 
the  reflected  wave  back  on  Earth  from  one  of  several  of  the  ionised  layers  in  the 
atmosphere  with  a  minimal  path  loss.  This  propagation  mode  is  referred  to  as  the  Near 
Vertical  Incidence  Skywave  (NVIS)  mode  and  has  been  used  by  the  US  army  since 
World  War  2.  The  NVIS  mode  can  be  used  to  eliminate  skip  zones  by  adjusting 
anterma  heights  and  transmitter  frequencies  [17].  The  best  frequency  of  operation  for 
this  mode  lies  in  the  2  to  10  MHz  frequency  band  [18].  The  mode  is  also  dependent  on 
the  directivity  and  polarisation  of  the  anteima  and  any  groxmd  wave  present  can  cause 
interference  effects.  Nevertheless,  an  HF-SSB  radio  with  modem  features  operating  in 
the  NVIS  mode  can  be  used  successfully  to  provide  satisfactory  communications  for 
low  flying  tactical  aircraft  over  a  50  kin  (or  greater)  range  in  virtually  any  type  of 
terrain  condition  [18].  Dming  daytime  operations  a  lower  power  output  can  be  used 
although  at  night  and  during  dawn  a  high  power  output  must  be  used  to  overcome  tiie 
presence  of  noise  and  interference.  With  a  400  W  transmitter  the  probabilities  of 
communications  success  is  1.00,  0.92  and  0.87  for  operation  during  the  day,  at  dawn 
and  at  night  respectively  whereas  witii  a  40  W  transmitter  these  probabilities  drop  to 
0.87, 0.59  and  0.73  respectively  [18]. 
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The  NVIS  mode  must  be  considered  for  the  following  situations  [16]  when: 

(a)  the  area  of  operations  is  not  conducive  to  grotmd  wave  commrmications  such  as 
moxmtainous  terrain, 

(b)  tactical  deployments  that  place  stations  in  anticipated  skip  zones  when  using 
whip  antennas,  frequency  selection  methods  and  operating  procedtures, 

(c)  operating  in  dense  wet  vegetation  or  other  areas  of  high  signal  attenuation, 

(d)  prominent  terrain  features  are  not  imder  friendly  control, 

(e)  operating  from  defiladed  positions, 

(f)  operating  against  enemy  groundwave  jammers  and  direction  finders,  and 

(g)  flying  close  to  the  ground. 

The  above  indicates  that  it  is  feasible  to  design  an  HF  data  link  utilising  either  groxmd 
waves  or  NVISs,  although  the  bandwidth  will  be  very  small  in  comparison  with  that 
from  a  data  link  using  carrier  frequencies  in  the  VHF,  UHF  or  SHF  bands.  However, 
the  UHF  and  SHF  links  are  limited  to  maintaining  a  LOS  with  the  p-UAV.  Another 
problem  with  data  links  using  higher  frequency  carrier  waves  is  that  they  require 
costly  and  heavy  components  for  operation  and  use  significantly  more  power  than 
their  low  frequency  cormterparts  in  &e  HF,  MF  and  lower  bands.  Furthermore,  if  VHF 
or  higher  frequency  carrier  waves  are  used  for  data  communication,  then  the  GCS  may 
require  a  directional  antenna  to  track  the  vehicle  as  in  the  case  of  Pointer  UAV  [19]. 

The  selection  of  an  appropriate  data  link  for  a  UAV  is  dependent  on  the  nature  of  the 
mission,  the  amount  of  power  the  vehicle  can  provide  and  the  vehicle's  size  and  design 
in  accommodating  the  antenna.  For  example,  an  LOS  data  link  would  be  able  to 
provide  continuous  moving  pictures  during  sxu^eillance  operations  of  river  banks  and 
nearby  areas  (currently  being  carried  out  by  Regional  Force  Surveillance  Units  [20])  or 
in  the  Protection  of  Vital  Assets  (PVA)  such  as  the  surveillance  of  air  field  perimeters. 
However,  in  missions  where  the  vehicle  is  flying  in  mountainous  or  dense  terrain,  it 
may  be  more  appropriate  to  use  HF  data  links  even  with  their  restricted  bandwidths. 
In  addition  to  the  amount  of  power  which  can  be  supplied  by  the  vehicle,  the  anterma 
dimensions  for  both  the  vehicle  and  the  GCS  must  be  considered.  For  example,  both 
these  factors  limit  TV  transmission  to  about  80  km.  To  transmit  beyond  this  distance, 
relay  stations  must  be  used,  which  simply  receive  the  signal,  amplify  it  and  then 
retransmit  it. 
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4.  Data  Compression 


Data  compression  can  be  defined  as  the  collection  of  techniques  which  reduce  the 
amotmt  of  digital  data  carrying  useful  information.  These  techmques  are  essential  for 
the  efficient  handling  of  digital  information.  In  addition,  they  may  be  useful  in  limiting 
the  effectiveness  of  jamming  and  may  provide  a  means  of  transmitting  several 
channels  over  the  bandwidth  of  a  communication  link  where  normally  only  one 
uncompressed  channel  coxild  be  employed  to  transmit  data.  Image  data  differ  from 
other  forms  of  data  in  that  they  are  noisier  and  hence  cannot  be  preserved  exactly, 
although  they  can  be  preserved  sufficiently  for  the  human  visual  system  not  to  notice. 

If  digital  TV  images  consisting  of  512X512  picture  elements  or  pixels  requiring  6 
bits/pixel  are  to  be  displayed  on  a  screen  at  25  frames  per  sec,  then  this  would 
correspond  to  a  transmission  rate  of  at  least  40  Mbit/s.  Because  the  transmission  rate  is 
determined  by  bandwidth,  it  is  also  affected  by  the  type  of  modulation  technique.  For 
example,  for  3  bit  codes  6  Mbit/s  are  sufficient  to  transmit  a  1  MHz  video  signal  on  a 
DPCM  system  compared  with  14  Mbit/s  for  a  PCM  system  [21].  Furthermore,  a  pulse 
or  signal  that  can  assume  n  distinct  states  or  levels  carries  information  equal  to  the 
logarithm  of  tihe  number  of  choices  or  simply  log2(n)  bits,  for  example,  an  octal  pulse 
(one  with  eight  different  voltage  levels)  can  be  represented  by  combinations  of  3  bits 
for  each  level. 

To  convert  a  video  signal  into  a  digital  one,  it  must  be  sampled  and  quantised. 
Sampling  the  amplitude  of  a  modulating  signal  must  be  carried  out  at  regular  time 
intervals.  According  to  the  Nyquist  criterion,  if  a  waveform  has  a  bandwidth  of  f  Hz, 
then  it  is  possible  to  convey  aU  the  information  in  that  waveform  by  2f  or  more  equally 
spaced  samples  per  second  of  the  amplitude  of  the  waveform  [22].  In  practice  a 
sampling  rate  of  at  least  4  times  the  video  bandwidth  is  required  for  each  signal  [23]. 
Quantisation,  on  the  other  hand,  is  the  assignment  of  approximate  discrete  intensity 
values  for  the  amplitudes  of  the  sampled  points.  The  best  contrast  performance  for  the 
lowest  number  of  coded  bits  per  sample  is  achieved  by  using  3  bits  corresponding  to 
eight  intensity  levels.  Thus  for  a  video  bandwidth  of  2.3  MHz,  which  corresponds  to 
the  transmission  of  images  with  a  resolution  of  400X300  pixels  at  a  frame  rate  of  20  Hz, 
a  t5q)ical  sampling  rate  would  be  9.2  MHz  and  thus,  a  bit  rate  of  at  least  27,6  Mbps 
would  be  required  assximing  a  three  bit  code  (eight  grey  levels)  for  each  sample  [23]. 

Data  messages  are  particularly  susceptible  to  instantaneous  loss  of  signal  since  fading 
or  corruption  by  noise  can  result  in  the  loss  of  a  few  bits  that  can  destroy  the 
information  content.  The  most  important  characteristic  of  a  data  link  is  its  minimum 
Bit  Error  Rate  (BER)  as  opposed  to  the  minimum  received  Signal  to  Noise  Ratio  (SNR) 
for  an  analogue  channel.  The  difficult  part  in  constructing  a  data  link  is  minimising 
BERs  by  attempting  to  reduce  noise  power  independently  of  signal  power  because 
there  is  a  limit  on  the  amount  by  whidi  the  latter  can  be  increased.  Frost  et  al  [8]  state 
tiiat  because  the  uplink  of  a  UAV  should  respond  correctiiy  to  the  GCS  commands,  the 
BER  should  be  as  small  as  possible  («  lO-s)  whereas  BERs  as  high  as  0.001  may  be 
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acceptable  in  the  display  of  image  data  due  to  die  high  level  of  redundancy.  The 
amount  of  redundant  data  in  a  TV  image  has  been  estimated  to  be  as  high  as  99.9%  in 
special  cases.  The  large  amoimt  of  redxmdancy  is  attributed  to  the  fact  that  the  Human 
Visual  System  (HVS)  responds  most  responsively  to  scene  details  of  high  contrast,  i.e. 
the  edges  of  objects  in  an  image.  BER  v^ues  of  0.001  may  be  achievable  using  grotmd 
or  sky  waves  but  for  the  NVIS  mode  and  the  skip  zone  the  BERs  are  likely  to  be  much 
worse  for  considerable  periods. 

The  ability  to  measure  die  maximum  amount  of  information  per  second  that  a  system 
can  transmit  cannot  answer  the  question  of  which  system  or  group  of  systems  has 
sufficient  capacity  to  transmit  a  specified  class  of  information  bearing  messages.  To 
answer  this  question  the  information  content  of  a  signal  must  be  determined.  For 
example,  the  appropriate  system  to  transmit  a  speech  in  English  is  determined  by  the 
information  content  of  the  speech  and  the  time  available  to  complete  transmission  [4]. 
The  information  content  in  messages  consisting  of  equally  likely  symbols  is  given  by  I 
=  M  log2(N)  where  I  is  the  information  content,  M  is  die  munber  of  symbols  and  N  is 
the  number  of  bits  per  symbol.  However,  in  most  cases  certain  letters  and 
combinations  of  leders  occur  more  often  than  others,  for  example,  e  occurs  more 
frequendy  than  z  and  u  is  more  likely  to  appear  after  a  q.  Thus,  the  information  content 
of  a  message  not  only  relates  to  the  number  of  possible  signal  combinations  but  also  to 
their  relative  frequency  of  occurrence,  which,  in  turn  depends  upon  the  source  of  the 
message.  The  measure  of  the  amoimt  of  information  contained  in  a  set  of  data  yields 
the  entropy  of  the  information  source  producing  the  data.  Entropy  is  defined  as  the 
sum  over  aR  members  of  a  symbol  alphabet  in  which  each  probability  of  occurrence  is 
multiplied  by  its  logarithm  to  base  2. 

For  a  digital  video  image,  the  symbols  are  the  quantised  intensity  values  at  the  pixels. 
For  example,  in  8-bit  quantisation  (typical  for  video  quality  animation)  there  are  256 
symbols  and  die  entropy  is  given  in  bits  per  pixel  (bpp),  which  is  known  more 
commonly  as  the  bit  rate.  In  general,  the  intensity  values  at  nearby  pixels  are  highly 
correlated  and  as  a  consequence,  digital  video  images  contain  much  redundant  data, 
thereby  yielding  a  lower  entropy. 

The  need  for  data  compression  in  a  p-UAV  wideband  uplink  can  now  be 
demonstrated  by  considering  the  transmission  of  image  data.  For  images  of  320X200 
pixels  with  256  (8  bits)  levels  of  grey  on  an  LCD  screen,  a  minimum  of  0.51  Mbits/s  are 
required  for  a  new  frame  every  second.  Additional  bits  are  required  to  indicate  the 
start  and  end  of  each  quantised  level,  the  so-called  start  and  stop  bits.  Therefore,  0.64 
Mbits/ s  need  to  be  transmitted  to  exhibit  one  frame  per  second.  Furthermore,  the 
actual  rate  may  be  even  greater  if  an  additional  bit  is  required  for  error  bit  checking, 
the  so-caUed  error  bit.  However,  the  pre-aUocated  bandwidth  for  FIF  frequencies  is 
about  10  kFlz.  If  the  assumption  is  made  that  the  entire  bandwidth  can  be  used  for  the 
transmission  of  data,  then  to  obtain  1  image  per  second  on  an  LCD  for  an  SNR  of  about 
1,  i.e.  a  channel  capacity  of  10  kbits/ s,  the  data  would  need  to  be  compressed  by  at 
least  a  factor  of  64.  The  compression  ratio  is  even  greater  for  the  transmission  of 
monochrome  (black  and  white)  images  with  a  640X400  resolution. 
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For  naval  vessels  conducting  surveillance  operations  over  an  open  sea,  mission 
requirements  may  call  for  highly  resolved  images,  for  example,  600X400  pixels  (VGA 
quality)  with  24-bit  colomr  at  TV  frame  rates  of  25  Hz.  In  this  situation  an  LOS  data  link 
would  be  required.  The  given  image  resolution  may  be  necessary  since  much  of  the 
information  displayed  on  the  images  received  at  the  GCS  would  consist  of  open  sea 
and  sky.  On  the  other  hand,  Rejman  [24]  states  that  the  effects  of  reducing  frame  rate 
are; 

(a)  manual  navigation  and  feature-flying  tasks  take  much  longer  to  accomplish, 

(b)  task  areas  are  covered  less  efficiently,  and 

(c)  the  task  of  target  detection  may  be  performed  poorly. 

Thus,  the  frame  rate  should  be  kept  as  high  as  possible  even  though  in  some  cases 
reducing  it  may  be  necessary  to  accommodate  the  transmission  of  image  data  along  a 
narrow  pre-allocated  bandwidth  such  as  an  HF  data  link. 

To  transmit  VGA  images,  which  only  allow  16  colours  (4-bits)  per  pixel,  a  transmission 
rate  of  25  Mbits/s  is  required  for  a  frame  rate  of  25  Hz.  This  transmission  rate  exceeds 
greatly  the  channel  capacity  of  TV  channels  whose  bandwidths  are  typically  5  MHz. 
For  a  system  with  an  SNR  of  10  dB,  which  is  indicative  of  the  minimiun  for  a  digital 
system,  the  channel  capacity  for  a  bandwidth  of  5  MHz  wordd  be  17  Mbits/  s.  Thxis, 
VGA  images  need  to  be  compressed  if  the  frame  rate  is  kept  at  25  Hz. 

Even  if  the  frame  rate  were  reduced  so  that  the  amoimt  of  data  transmitted  to  the  GCS 
could  be  accommodated  by  a  TV  channel,  there  are  other  important  reasons  for 
compressing  sensor  payload  data,  provided  that  the  resulting  images  do  not  exhibit 
serious  degradation.  If  the  system  is  to  operate  in  an  environment  rendered  hostile  by 
the  use  of  Electronic  Counter  Measures  (ECM)  such  as  jamming,  then  a  reduction  in 
signal  bandwidth  would  reduce  die  vulnerability  in  the  transmission  of  sensor  data.  In 
addition,  the  lower  the  signal  bandwidth,  the  lower  the  power  required  to  transmit 
data.  Thus  the  transmission  of  less  redundant  information  can  be  exploited  to  produce 
an  effective  improvement  in  SNR,  i.e.  reducing  the  bit  rate  is  equivalent  to  increasing 
transmitter  power.  This  is  particularly  important  for  UAVs  with  limited  power  sources 
such  as  the  p-UAV  investigated  by  CK  [1]. 

There  are  numerous  data  compression  techniques  for  compressing  binary  data  [2], 
which  can  be  categorised  as  follows; 

(a)  reversible  or  information-lossless  image  compression.  Here  the  original  digital 
representation  of  an  image  can  be  fully  reconstructed  at  the  receiver  from  the 
compressed  data.  Examples  include  run-length  coding,  contour  coding,  Huffman 
coding,  arithmetic  coding  and  conditional  replenishment. 
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(b)  predictive  methods.  These  involve  predicting  the  intensity  value  at  a  given  pixel 
based  on  the  values  of  previously  processed  pixels.  Examples  include  the  above- 
mentioned  DPCM,  Delta  Modulation  (DM),  and  Motion  Compensation  (MC). 

(c)  block  methods.  Here  the  image  is  subdivided  into  blocks,  which  are  then 
processed  in  a  variety  of  methods.  For  example,  in  Vector  Quantisation  (VQ)  the 
blocks  are  compared  to  a  codebook  of  vectors  and  the  code  with  the  cbsest  match  is 
transmitted  wMe  in  block  tnmcation  coding  the  value  at  each  pbcel  in  a  block  is 
coded  as  a  0  or  a  1  depending  on  whether  it  is  above  or  below  a  chosen  threshold. 
Digital  data  such  as  character  strings  cannot  be  transmitted  using  vector 
quantisation  because  small  changes  in  the  numerical  value  of  a  character  lead  to 
enormous  changes  in  meaning. 

(d)  Human  Visual  System  (HVS)  compensation.  These  techniques  attempt  to 
compress  video  images  by  eliminating  data  not  perceptible  to  the  HVS,  even  if  die 
data  are  important  from  an  information  theory  point  of  view.  Some  techniques 
apply  a  model  of  the  HVS  directly  to  the  image  data  whereas  others  have  been 
developed  to  represent  as  many  features  of  the  HVS  as  possible.  Examples  include 
the  method  of  synthetic  highs,  pyramid  coding,  regional  growing  and  directional 
decomposition. 

(e)  transform  coding.  This  information  lossy  technique  uses  a  mathematical  operator 
to  produce  an  array  of  imcorrelated  or  nearly  uncorrelated  data  from  the  highly 
correlated  data  representing  a  digital  image.  Examples  include  the  Karhunen- 
Loeve  Transform  (KLT),  the  Discrete  Cosine  Transform  (DCT),  die  Slant 
Transform  and  die  Hadamard  Transform; 

(f)  hybrid  techniques.  These  consist  a  mixture  of  the  techniques  described  above.  An 
example  is  die  DCT/VQ,  which  involves  using  VQ  on  the  DCT  coefficients. 

Performance  of  data  compression  techniques  is  evaluated  in  terms  of  die  Mean  Square 
Error  (MSE)  or  variants  of  it.  The  MSE  is  essentially  an  error  measure  consisting  of  the 
sum  of  the  square  of  the  intensity  differences  between  the  reconstructed  and  original 
images  divided  by  the  square  of  the  number  of  intensity  values.  In  later  sections  we 
shall  use  a  variant,  the  Peak  Signal-to-Noise  Ratio  (PSNR).  This  is  given  by, 

PSNR=  20  Ioglo(b/dr7ns), 

where,  b  is  the  largest  possible  value  of  the  signal  (typically  255)  and  drms  is  the  root 
mean  square  error  difference  between  two  images  f  and  g  defined  as 

dm,,  =  ^  (f(x,y  )  -  g  (x,y ))'  dx  dy  (1) 

In  the  above,  (x,y)  refers  to  the  position  coordinates  of  both  images. 
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Lossless  techniques  cannot  offer  the  high  compression  ratios  required  for  the  p-UAV 
data  link  and  of  the  remaining  techniques  the  one  that  has  become  the  most  prominent 
is  the  DCT.  This  technique  offers  both  high  compression  ratios  and  a  low  MSB  [2], 
which  the  other  transform  techniques  cannot  offer.  At  the  same  time  it  meets  the  JPEG 
standard.  However,  there  are  two  relatively  new  transform  coding  techniques  which 
promise  even  higher  compression  ratios  and  good  fidelity.  These  are  fractal  and 
wavelet  transform  coding  techniques.  In  what  follows  we  aim  to  discuss  these 
techniques  in  detail  and  review  the  current  state  of  the  art  in  employing  these 
techniques  in  image  data  compression. 


5.  The  Discrete  Cosine  Transform 


We  begin  our  study  of  the  image  data  compression  techniques  mentioned  at  the  end  of 
the  previous  section  with  the  DCT,  which  was  first  employed  by  Ahmed  et  al  [25].  The 
DCT  is  an  orthogonal  transformation  in  that  mathematical  operators  are  used  to  form  a 
complete  orthogonal  set  of  xmique  basis  vectors.  The  transform  acts  to  'pack'  a  large 
number  of  highly  correlated  image  data  samples  into  a  smaller  number  of  xmcorrelated 
coefficients  [2].  Of  the  three  techniques,  which  we  aim  to  review,  the  DCT  algorithm  is 
the  only  one  that  meets  the  JPEG  standard  for  sequential  lossy  compression  according 
to  page  219  of  Ref.  26  and  as  a  consequence,  it  has  become  synonymous  with  JPEG 
compression. 

A  major  advantage  of  the  DCT  is  that  its  basis  vectors  are  known.  Hence  these  do  not 
need  to  be  calculated  for  every  transform  block,  thereby  reducing  the  encoding  time. 
They  also  do  not  need  to  be  transmitted  together  with  the  coefficients,  tiiereby 
reducing  the  transmission  time.  Another  advantage  of  the  DCT  is  that  there  are 
already  several  fast  algorithms  for  computing  them  [2]. 

DCT  image  compression  involves  dividing  the  original  image  into  smaller  NxN  blocks 
and  then  transforming  these  blocks  via  the  Forward  Discrete  Cosine  Transform 
(FDCT)  into  equal-sized  blocks  of  coefficients  in  the  frequency  domain.  Because  it 
employs  the  same  basis  vectors  for  each  transform  block,  tirey  only  need  to  be 
evaluated  for  the  first  transform  block  with  a  lookup  table  being  used  for  the  other 
blocks.  Data  compression  is  achieved  by  assigning  fewer  bits  to  the  coefficients  in 
order  to  remove  redundant  information  via  a  couple  of  methods.  First,  threshold 
sampling  is  used  so  that  all  coefficients  above  a  certain  magnitude  are  retained  while 
those  below  the  threshold  value  are  set  equal  to  zero.  Second,  the  NxN  array  is 
compressed  further  by  xmdergoing  rounding-off  or  quantisation  of  the  pixel  intensity 
levels.  The  degree  of  quantisation  is  greater  for  the  higher  frequency  coefficients  since 
the  human  eye  is  more  sensitive  to  rotmding-off  at  the  lower  frequencies.  The  resulting 
data  are  encoded  via  a  lossless  technique  such  as  arithmetic  or  Huffman  coding  to 
avoid  the  loss  of  time  experienced  in  transmitting  the  many  coefficients  that  become 
zero  after  quantisation.  The  data  can  then  be  transmitted  over  a  communication  link 
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with  some  error  correcting  code  to  enable  decoding  at  the  receiver  by  applying  the 
Inverse  DCT  (IDCT)  which  gives  a  representation  of  the  original  output  image  [2]. 

When  the  DCT  is  implemented  using  the  JPEG  standard,  the  image  is  first  partitioned 
into  8x8  blocks  and  the  FDCT  is  appHed  to  these  [27].  In  JPEG  compression, 
thresholding  and  quantisation  occur  together  in  one  matrix.  The  DC  term,  a  DCT 
coefficient  representing  the  mean  pixel  value  for  each  block,  is  differenced  from  the 
DC  term  of  the  preceding  block  in  a  scanning  order  and  ihe  remaining  coefficients  are 
passed  to  an  entropy  encoder.  An  entropy  coding  scheme,  typically  Hufftnan  coding, 
is  then  employed  to  assign  codewords  to  coefficients  in  such  a  way  that  short 
codewords  are  assigned  to  the  more  frequent  terms  while  longer  ones  are  assigned  to 
the  rarer  terms.  The  values  are  encoded  in  a  zigzag  manner  as  there  is  a  high 
correlation  between  values  along  this  zigzag  scan.  Decompression  is  accomplished  by 
applying  the  inverse  of  each  step  in  the  opposite  order  [26,28].  For  a  discussion  on  the 
mathematical  details  concerning  the  implementation  of  the  DCT  the  reader  is  referred 
to  Appendix  A. 


5.1  Improvements  in  the  DCT 

JPEG  compression  can  produce  undesirable  blocking  artefacts  for  high  compression 
rates.  That  is,  if  the  amount  of  thresholding  and  quantisation  is  too  severe,  then  sub¬ 
block  boundaries  may  appear  in  the  reconstructed  image.  At  the  same  time  setting  too 
many  high  frequency  coefficients  to  zero  can  lead  to  a  loss  in  resolution.  As  a 
consequence,  much  activity  is  being  directed  at  developing  non-standard  methods  of 
employing  tiie  DCT  and  using  JPEG  compression  as  the  bench-mark.  Rather  than 
rotmding  off  to  the  nearest  integer  values  after  dividing  by  quantisation  coefficients  as 
in  JPEG  compression  (see  Appendix  A),  Eude  et  al  [29]  have  recently  proposed  using  a 
mixture  of  Gaussian  distributions  on  DCT  coefficients  in  their  search  for  a  better 
means  of  quantisation.  They  found  that  by  approximating  the  high  frequency  DCT 
coefficients  by  a  single  Gaussian  distribution  and  the  low  ones  by  a  mixture  of  two  or 
three  Gaussian  distributions,  they  were  able  with  their  new  quantisation  matrix  to 
remove  blocking  effects  present  in  JPEG  compression. 

Another  area  where  DCT  image  compression  is  being  improved  is  in  the  acceleration 
of  the  algorithmic  process.  For  example,  in  Ref.  [30]  Hung  and  Meng  describe  two 
methods  for  accelerating  the  computation  of  the  inverse  DCT  (see  Appendix  A)  by 
exploiting  the  sparseness  of  the  quantised  transform  coefficients.  One  method  referred 
to  as  the  Symmetric  Mapped  Inverse  DCT  (SMIDCT)  can  perform  up  to  three  times 
faster  than  the  Forward  Mapped  Inverse  DCT  (FMIDCT)  [31],  the  previous  best 
optimisation  of  the  inverse  DCT  for  sparse  matrices  [32].  In  addition,  Jtmg  and  Mitra 
[33]  have  developed  a  method  that  not  only  reduces  the  blocking  effect  mentioned 
above,  but  also  accelerates  the  computation  of  DCT  coefficients.  Basically  their  method 
involves  decomposing  the  computation  of  an  N-point  DCT  into  a  computation  of  an 
(N/2)-point  DCT  and  an  (N/2)-point  Discrete  Sine  Transform  (DST).  Jimg  and  Mitra 
refer  to  this  method  as  SubBand  DCT  (SB-DCT)  decomposition  and  have  tested  tiie 
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method  on  images  with  a  256x256  resolution.  They  have  demonstrated  that  their  SB- 
DCT  method  not  only  matches  JPEG  coding  with  respect  to  PSNR  but  also  performs  at 
least  twice  as  fast.  In  addition,  the  compressed  images  exhibit  much  less  blocking 
effects  than  the  corresponding  JPEG  compressed  images. 

Khataie  and  Soleymani  [27]  have  proposed  two  different  two-stage  image  compression 
schemes  aimed  at  achieving  better  quality  images  than  JPEG  compression  for 
moderate  to  high  PSNRs.  In  the  first  stage  both  schemes  process  the  more  important 
low  frequency  components  of  the  image  through  transform  coding  while  the  high 
frequency  components  lost  in  the  first  stage  are  encoded  in  file  second  stage.  The  first 
scheme  employs  a  DCT  algorithm  combined  with  a  high  rate  Lattice-based  Vector 
Quantiser  (LVQ)  algorithm  in  the  first  stage  while  the  second  scheme  employs  a 
standard  JPEG  encoder.  For  a  description  of  LVQ  schemes,  most  of  which  employ  the 
algorithm  designed  by  Linde,  Buzo  and  Gray  (known  as  the  LBG  algorithm)  [34],  the 
reader  is  referred  to  Ref.  [2].  In  the  second  stage  each  scheme  processes  the  "error"  of 
the  residual  image  formed  by  subtracting  the  output  of  the  first  stage  from  the  original 
image.  Khataie  and  Sole5anani  use  the  same  low-rate  LVQ  in  botiti  schemes.  They  find 
for  PSNRs  of  greater  than  38  dB  the  first  scheme  is  much  superior  to  JPEG 
compression.  For  example,  they  achieve  1.2  bpp  for  a  PSNR  of  40.56  as  opposed  to  1.8 
bpp  via  JPEG  compression.  They  also  find  that  although  0.7  bpp  can  be  achieved  at  a 
PSNR  of  36.0  using  JPEG  compression  compared  with  1.05  bpp  from  the  second 
scheme,  the  latter  outperforms  JPEG  compression  significantly  for  high  quality 
compression.  For  example,  1.6  bpp  can  be  achieved  for  a  PSNR  of  47.73  via  the  second 
scheme  whereas  JPEG  offers  only  3.9  bpp.  Khataie  and  Soleymani  conclude  that  both 
schemes  offer  a  considerable  improvement  over  standard  JPEG  compression  for 
moderate  to  high  quality  compression.  By  using  either  scheme  more  than  ninety  five 
per  cent  of  input  information  can  be  retrieved  with  a  bit  rate  less  than  2  bpp.  However, 
these  authors  do  not  discuss  the  amount  of  time  involved  in  processing  images  via 
both  schemes.  In  addition,  although  these  techniques  are  still  imder  development,  for 
fiiem  to  be  viable  for  the  p-UAV  much  higher  compression  ratios  are  required. 

In  regard  to  reducing  tire  processing  time  Walmsley  et  al  [28]  have  proposed  a  pruning 
algorithm  in  which  a  smaller  proportion  of  DCT  values  are  calculated.  That  is,  instead 
of  calculating  DCT  values  for  an  8X8  image  block,  they  only  find  it  necessary  to 
calculate  the  DCT  values  for  a  4X4  subset  whilst  simultaneously  maintaining  an 
acceptable  image  quality.  Their  pruning  algorithm  requires  a  total  of  82  multiplications 
and  227  additions  for  a  block  compared  with  192  multiplications  and  464  additions 
using  the  standard  row-column  approach.  Anotiier  advantage  is  that  parallelisation 
can  be  performed  during  stages  of  the  algorithm  because  two  or  more  processors  can 
be  invoked  to  calculate  separate  data  partitions  that  arise  from  the  decomposition  of 
the  DCT.  When  applied  to  the  JPEG  standard  the  pruning  algorithm  not  only 
accelerates  image  compression  as  a  result  of  calculating  less  DCT  coefficients  but  also 
produces  higher  compression  ratios  with  negligible  degradation  in  image  quality  due 
to  the  fact  &at  there  is  only  one  long  length  of  zeros  along  the  zig  zag  scan  of  the 
encoder.  Walmsley  et  al  find  that  for  an  acceptable  loss  in  image  quality,  i.e.  a  pruning 
value  of  4,  their  algorithm  results  in  a  speed  up  of  over  50%  on  JPEG  compression. 
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Another  area  of  much  interest  is  the  introduction  of  new  architecture  designed  to  use 
less  area  of  each  processing  chip.  For  example,  Wang  and  Qien  [35]  have  proposed 
using  systolic  arrays  in  DCT  computation,  which  have  received  much  attention  ever 
since  they  were  introduced  by  Kung  and  Leiserson  [36]  in  the  design  of  high  speed 
signal  processing  systems.  Systolic  systems  possess  the  desirable  features  of  regularity, 
modularity  and  concurrency,  thereby  enabling  parallel  computing  architectures  to  be 
created  from  them  which  are  necessary  in  meeting  the  realtime  requirements  for  tiie 
transmission  of  image  data.  On  the  other  hand,  Mariatos  et  al  [37]  have  introduced  a 
novel  architecture  employing  the  Coordinate  Rotation  Digital  Computer  (CORDIC) 
Circular  Rotation  Algorithm.  The  CORDIC  algorithm  is  based  on  the  decomposition  of 
the  DCT  matrix  into  rotations.  The  new  architecture  requires  less  than  40%  of  the  area 
of  previous  CORDIC  architectures  to  perform  DCT  computations  and  when 
extensively  pipelined  (up  to  80  pipeline  stages  can  be  set)  it  can  process  the  fast  signals 
of  High  Definition  Television  (HDTV).  Mariatos  et  al  have  adopted  2-bit  digit-serial 
arithmetic  to  bring  about  a  reduction  in  hardware.  The  chip  can  perform  at  a 
throughput  rate  of  500  MHz  or  250  Mpixels/s  and  needs  about  2.6K  gates. 

Although  the  DCT  has  become  the  most  popular  image  compression  technique  mainly 
because  of  its  implementation  in  the  JPEG  standard,  from  the  preceding  material  it  can 
be  seen  that  there  is  still  room  for  improvement,  especially  in  the  transmission  of 
HDTV  signals  or  in  the  transmission  of  image  data  over  relatively  narrow  bandwidths 
with  a  requirement  for  a  near  realtime  capability  such  as  our  p-UAV  application. 
Current  activity  is  concerned  mainly  with  reducing  the  processing  time  although 
improvements  in  the  quantisation  process  may  lead  to  marginally  higher  compression 
ratios.  The  techniques  mentioned  above  should  be  monitored  as  they  will  undoubtedly 
lead  to  a  new  JPEG  standard  in  the  future.  For  our  p-UAV  application,  however,  we 
require  much  higher  compression  ratios  than  those  offered  at  present  by  the  DCT, 
particularly  if  we  wish  to  consider  HF  propagation.  Two  techniques  promising  higher 
compression  ratios  than  the  DCT  are  fractal  and  wavelet  transform  coding  and  we 
shall  investigate  the  current  state  of  the  art  of  these  techniques  in  the  following 
sections. 


6.  Fractal  Transform  Coding 

A  fractal  is  a  fragmented  geometrical  shape  that  can  be  continually  subdivided  into 
parts,  in  which  each  part  is  a  copy  of  the  original  shape  only  reduced  in  size.  That  is, 
fractals  are  generally  self-similar.  Many  real  world  objects,  which  are  not  simple 
geometric  shapes  such  as  clouds,  mountains  and  coastiines,  can  be  described  by 
fractals  because  real  world  images  possess  local  self-similarity  as  described  in  Ch.  1  of 
Ref.  38.  This  means  that  only  parts  of  images  possess  the  same  self-similar 
transformations  and  hence,  an  image  consists  of  properly  transformed  parts  of  itself. 
These  transformed  parts  seldom  combine  to  form  em  exact  copy  of  the  original  image. 
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As  a  consequence,  an  image  encoded  as  a  set  of  fractal  transformations  will  not  be  an 
identical  copy,  but  an  approximation,  i.e.  lossy. 

It  should  also  be  noted  that  while  we  are  primarily  concerned  with  realtime 
applications  of  fractal  image  compression  in  this  report,  the  implementation  of  this 
technique  in  non-realtime  environments  has  already  met  with  remarkable  success.  In 
1992  Microsoft  released  a  compact  disc  known  as  Microsoft  Encarta,  which  is  a  popular 
multimedia  encyclopedia  containing  7  hr  of  sound,  100  animations,  800  colour  maps 
and  more  than  7000  pictures  all  encoded  in  less  than  600  Mb  of  data.  Microsoft  has 
been  able  to  achieve  this  astonishing  feat  using  fractal  image  compression  techniques. 
Thus,  it  is  only  a  matter  of  time  before  consumers  will  use  diis  technology  to  store  their 
valuable  pictures  on  compact  disc  rather  than  adopting  the  archaic  procedure  of 
storing  them  in  photographic  albums. 

Deterministic  fractals  possess  the  intrinsic  property  of  extremely  high  visual 
complexity  while  being  very  low  in  information  content  [39].  This  is  because  they  can 
be  generated  by  simple  recursive  deterministic  algorithms  and  it  is  this  property  that 
makes  them  a  useful  tool  in  image  compression.  Transformations  used  in  the 
description  and  pointers  to  image  regions  are  stored  rather  than  the  original  pixel 
image  data.  Thus,  fractal  transform  coding  yields  a  set  of  relations  based  on  the  spatial 
and  spectral  geometry  of  the  original  image,  which  describe  the  original  image  in 
terms  of  itself.  Fractal  images  not  only  provide  a  resolution  independent  image  of  the 
original,  but  can  also  yield  very  high  compression  ratios  [40].  For  a  description  of  the 
mathematical  concepts  leading  to  the  implementation  of  fractals  in  image  compression 
the  reader  is  referred  to  Appendix  B.  Here  we  shall  be  concerned  more  with  the  recent 
advances  employing  this  transform  coding  technique. 

We  now  describe  how  fractal  image  compression  can  be  applied  to  an  256X256  image 
in  which  each  pixel  is  any  of  256  (8  bit)  levels  of  grey.  Let  i?,024  be  the  8X8 

pixel  nonoverlapping  sub-squares  of  the  image  and  let  D  be  the  collection  of  all  16X16 
pixel  (overlapping)  subsquares  of  the  image  which  yields  58,081  squares.  For  each 
range  block  i?,  a  search  is  conducted  through  all  the  domain  blocks  of  D  to  find  the 
block  D,.  which  minimises  the  rms  metric  given  by. 

That  is,  we  find  pieces  and  maps  w,. ,  so  that  when  a  w,.  is  applied  to  the  part  of 
the  image  over  D,.,  something  very  close  to  the  part  of  the  image  over  i?,  is  obtained. 
There  are  eight  ways  to  map  one  square  onto  another,  which  means  that 
8  •  58,081=464,648  squares  with  each  of  the  1024  range  squares.  In  addition,  a  square  in 
D  has  four  times  as  many  pixels  as  each  ,  so  that  either  subsampling,  i.e.  choosing 
one  pixel  from  each  2X2  subsquare  of  D,-  or  averaging  the  2X2  subsquares 
corresponding  to  each  pixel  of  must  be  adopted  to  minimise  the  above  equation. 
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Minimising  the  mis  metric  requires  not  only  finding  a  Z),-  of  the  image  that  looks  most 
like  the  image  above  but  also  finding  good  contrast  and  brightness  settings  Sf  and 
o,.  for  the  PIFS  transformation  w,.  discussed  in  Appendix  B.  For  each  D  €  D,  S;  and  O; 
are  computed  by  using  least  squares  regression,  which  also  gives  a  resulting  nns 
difference.  The  chosen  Z>,.  is  the  D  e  D  with  the  least  rms  difference. 

The  selection  of  D-  together  with  the  corresponding  and  o-  allows  the 
transformation  w,.  to  be  put  in  matrix  form.  Once  a  collection  w, ,  W2  . . .  W1024  has  been 

determined,  the  image  can  be  decoded  by  estimating  the  attractor  A  as  defined  by 
Equation.  (B3)  in  Appendix  B.  In  general,  not  that  many  iterates  are  required  to  obtain 
a  representation  of  the  original  image  as  Fisher  shows  on  p.  15  of  Ref.  [38].  Here, 
representations  are  presented  after  the  first,  second  and  tenth  iterates  with  the  final 
one  displaying  aU  the  essential  features  of  the  original  65,536  byte  image.  The 
transformations  that  reconstitute  the  image  require  only  3968  bytes  since  each 
transformation  requires  8  bits  in  the  jc  and  y  directions  to  determine  the  position  of 
D^ ,  7  bits  for  o,- ,  5  bits  for  s,.  and  3  bits  to  determine  a  rotation  and  flip  operation  for 
mapping  Z),-  to  i?,. .  The  position  of  i?,.  is  implicit  in  the  ordering  of  the  transformations. 
Hence,  a  compression  ratio  of  16.5:1  is  obtained  with  an  rms  error  of  10.4.  Each  pixel  is 
on  average  orily  6.2  grey  levels  from  dieir  correct  value  while  with  each  iteration  more 
detail  is  added. 

According  to  Jacquin  [39]  the  three  main  issues  involved  in  the  design  of  a  fractal 
block  coding  system  are; 

(a)  the  partitioning  of  an  image, 

(b)  the  selection  of  a  measure  of  the  distortion  between  two  images, 

(c)  the  specification  of  both  a  finite  class  of  contractive  image  transformations  defined 
with  a  partition  and  of  a  scheme  for  the  quantisation  of  their  parameters. 

In  the  remainder  of  this  section  we  shall  primarily  be  concerned  with  issues  (a)  and  (c) 
while  issue  (b)  is  discussed  in  Appendix  B. 


6.1  Image  Partitions 

In  the  previous  subsection  we  presented  most  of  the  ideas  of  a  practical  fractal  knage 
encoding  scheme.  An  image  is  first  partitioned  by  some  collection  ranges  /?,.  and  then 
for  each  R,  a  domain  block  D-  from  the  collection  of  image  pieces  is  sought  tiiat  has  a 
low  rms  error  when  mapped  to  R. .  Once  R-  and  Z>,.  are  known,  Sj  and  Oj  of  a 
partitioned  Iterated  Function  System  (IFS)  can  be  determined  in  addition  to 
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a.  b.  c  d-,e-  and  f-  of  the  affine  transformation.  Eventually,  a  transformation 

J  ^  J 

W  =  \Jwj  is  obtained  that  encodes  an  approximation  to  the  original  image.  So  far,  we 
have  concentrated  on  fixed-size  range  blocks  i?,.,  but  there  are  regions  in  the  original 
that  are  difficult  to  cover  using  this  approach,  for  example,  a  person's  eyes  in  a 
photograph.  Furthermore,  there  are  regions  that  can  be  covered  with  a  larger 

thereby  reducing  the  total  ntimber  of  maps  w,.  required.  Optimal  partitioning  of  an 
image  is  not  only  capable  of  improving  the  quaHty  of  the  reconstructed  image,  but  can 
also  increase  the  compression  ratio. 

There  are  several  methods  of  partitioning  an  image,  some  of  which  we  describe  here. 
First,  Jacquin  [39]  presents  a  partitioning  technique  where  an  image  is  partitioned  by 
using  a  block  coding  design  based  on  the  theory  of  iterated  contractive  image 
transformations.  The  original  image  (long  is  partitioned  into  domain  cells  and  into 
nonoverlapping  square  range  cells  of  two  different  sizes  forming  a  two-level  square 
partition.  A  partition  constructed  in  this  way  is  image-dependent,  although  it  does 
allow  for  the  use  of  larger  blocks  to  take  care  of  smoothly  var5dng  image  regions  and 
smaller  ones  to  capture  detail  in  intricate  regions  such  as  rugged  boxmdaries  and  fine 
textures.  The  domain  blocks  form  the  pool  D  consisting  of  all  image  blocks  and  these 
are  then  classified  according  to  their  features  as; 

(1)  shade  blocks,  Ds, 

(2)  edge  blocks,  De,  and 

(3)  midrange  blocks.  Dm- 

Shade  blocks  do  not  possess  significant  gradients  and  are  not  used  as  domain  blocks. 
Hence,  they  can  be  removed  from  the  pool.  On  the  other  hand,  edge  blocks  possess 
strong  changes  of  intensity  and  are  split  further  into  simple  and  mixed  edges. 
Midrange  blocks  possess  moderate  gradients  but  no  definite  edges. 

Now  consider  an  rxr  digital  image  \i  quantised  to  256  grey  levels.  The  original  image  p 
is  partitioned  into  range  cells  of  two  different  sizes.  The  image 

transformation  can  be  represented  as; 


7  = 

i=l 

where  Si  and  Ti  are  the  geometric  and  massic  parts  of  gi.  First,  the  spatial  construction 
Si  must  be  constructed  by  selecting  an  image  domain  block  ploi  of  size  DxD,  which  will 
be  contracted  to  a  block  Si(|xlDi)  of  size  BxB.  The  symbol  Id  represents  the  part  of  the 
image  constrained  to  block  Di.  The  specification  of  the  domain  cell  Di  is  equivalent  to 
the  description  of  the  spatial  contraction  Si.  The  second  part  consists  of  finding  the 
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block  transformation  Tj  which  minimises  the  distortion  between  Ti  o  Si(^‘|Di)  and  ^iRi. 
The  distortion  measure  used  by  Jacquin  is  the  rms  distortion  between  image  blocks.  A 
pool  of  massic  transformations  T  can  now  be  obtained.  The  encoding  of  the  range 
blocks  p'Ir  consists  of  utiHsing  the  self-transformability  by  finding  the  best  pair 
(Di,Ti)eDxT  such  that  the  distortion  d(p']Ri,  TioSi(p>i))  is  a  minimtim.  By 
implementing  this  partitioning  scheme  Jacquin  was  able  to  achieve  a  bit  rate  of  0.06 
bpp  with  a  Peak  SNR  (PSNR)  of  31.4  dB  for  an  8  bit  512X512  resolution  of  die  image  of 
Lena. 

Jacquin's  method,  however,  expends  too  much  time  because  of  the  large  amount  of 
searching  in  the  domain  block  pool.  To  overcome  this  problem,  Bani-Eqbal  [41]  has 
devised  a  new  technique  for  speeding  up  the  search.  He  proposes  an  incremental 
metiiod  that  employs  Jacquin's  method,  but  limits  tiie  domain  block  pixels  by 
averaging  to  half  their  size.  This  is  referred  to  as  decimating  [39].  He  then  flips  them 
and  compares  them  with  the  range  blocks.  The  domain  blocks  are  arranged  into  a  tree, 
so  that  the  tree  can  be  navigated  to  select  a  small  niunber  of  candidate  blocks.  By  using 
this  method  he  is  able  to  achieve  for  a  256X256X8  bit  version  of  Lena  a  speeding  up  of 
more  than  50  times  on  the  Jacquin's  complete  search  method  without  any  noticeable 
degradation  in  image  quality.  Specifically,  he  finds  that  it  takes  8750  s  to  encode  the 
image  on  a  SUN  Sparcstation  10  Model  30  using  Jacquin's  method  whereas  with  his 
method  it  only  takes  150  s  with  a  marginal  increase  in  rms  distortion  (6.7  for  the 
complete  search  method  as  opposed  to  8.7  with  his  method). 


6.2  Quadtree  Partitioning 

Fisher  et  al  [42]  were  first  to  introduce  adaptive  methods  in  the  encoding  process.  They 
employed  various  approaches  including  quadtree,  rectangular  and  triangular 
partitions  of  the  range  blocks  to  improve  the  fidelity  of  an  image.  They  also  pointed 
out  that  it  is  not  necessary  to  impose  strict  contractivity  conditions  on  each  of  he 
coded  transformations  since  he  eventual  contractivity  of  heir  union  is  sufficient  to 
ensure  convergence  of  he  iteration  process  during  decoding. 

In  quadtree  partitioning  a  square  is  divided  into  four  equal  sub-squares  when  it  is  not 
covered  sufficiently  by  some  domain.  The  process  continues  recursively  beginning 
wih  he  entire  image  and  continuing  until  he  squares  are  small  enough  to  be  covered 
within  a  specified  rms  tolerance.  Small  squares  can  be  covered  better  han  large  ones 
because  adjoining  pixels  in  an  image  tend  to  be  highly  correlated.  Thus,  an  image  can 
be  represented  as  a  tree  in  which  each  node  contains  four  subnodes,  corresponding  to 
he  four  quadrants  of  he  square  while  he  root  of  he  tree  is  he  initial  unage. 

An  algorithm  based  on  he  above  mehod  can  be  developed  by  assuming  hat  he 
image  contains  256X256  pixels.  The  collection  of  permissible  domains  D  can  be  all  he 
sub-squares  of  8X8,  12X12, 16X16,  24X24,  32X32,  48X48  and  64X64.  Next  he  image  is 
partitioned  recursively  imtil  he  squares  are  32X32.  Then  an  attempt  is  made  on  each 
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square  to  be  covered  in  a  quadtree  partition  by  a  larger  domain.  The  success  of  each 
attempt  is  determined  by  meeting  a  predetermined  tolerance  value  e.  When  this 
condition  is  met,  the  square  can  be  called  Rf  and  die  covering  domain  D; .  If  the 
condition  is  not  met,  then  the  square  is  subdivided  and  the  process  repeated. 

According  to  Fisher  [38]  the  algorithm  works  well,  but  can  perform  even  better  if  the 
domain  pool  includes  diagonally  oriented  squares.  As  an  example,  Fisher  states  that 
for  a  256X256  grey  image  of  a  collie  the  quadtree  scheme  yields  a  compression  ratio  of 
28.95:1  with  an  rms  error  of  8.5.  However,  the  domain-range  comparison  step  of  the 
encoding  is  computationally  intensive  and  so  a  classification  scheme  is  invariably  used 
to  minimise  the  number  of  comparisons. 

Several  classification  schemes  exist.  One  of  these  schemes  is  the  block  method  used  by 
Jacquin  [39],  as  aheady  described,  while  another  is  archetype  classification  [38].  Here 
an  archetype  A  is  determined  by  searching  through  the  entire  domain  set  to  find  that 
member  of  the  set  titvat  covers  the  other  members  best.  Here  covering  means  that  both 
die  domain  and  corresponding  transformation  which  result  in  an  acctuate  mapping  to 
each  range  are  found.  Archet5qje  classification  is  similar  to  determining  a  Vector 
Quantisation  (VQ)  codebook,  but  a  major  difference  is  that  the  transformation,  w,  is 
included  in  the  process  of  determining  archetypes. 

In  Chapter  4  of  Ref.  [38]  Boss  and  Jacobs  introduce  an  archetype  classification  scheme 
which  is  subsequently  employed  to  encode  the  standard  Lena  image.  They  describe 
how  five  different  sets  can  be  generated,  each  consisting  of  72  archet3q)es.  Three  of  the 
archetype  sets  are  determined  from  three  sets  containing  five  qualitatively  dissimilar 
256X256X8  images  while  the  remaining  two  are  determined  from  sets  of  five 
qualitatively  similar  images.  None  of  the  sets  of  images  contains  the  Lena  image  or  any 
other  test  image.  Boss  and  Jacobs  show  that  when  the  number  of  classes  from  each 
archet5qje  set  is  the  same  as  conventional  block  classification  schemes  [43],  the  latter 
are  able  to  encode  images  much  faster  than  the  archetype  method,  although  image 
fidelity  or  rather,  the  ]%NR,  is  significantly  better  using  the  former  method.  As  a 
consequence,  the  number  of  archetype  classifications  can  be  lowered,  which  not  only 
yields  a  better  PSNR,  but  also  reduces  the  encoding  time  to  below  that  for  the 
conventional  scheme  with  its  higher  number  of  searched  classes.  By  considering  only 
six  searched  classes  for  the  archetype  method  Boss  and  Jacobs  show  that  the  Lena 
image  takes  about  200s  to  encode  on  an  Apollo  4500  workstation  yielding  a  PSNR  of 
24.25  whereas  with  24  searched  classes  in  their  conventional  scheme  encoding  of  the 
Lena  image  takes  over  300  s  yielding  a  PSNR  of  24.1. 


6.3  HV-Partitions 


A  deficiency  in  quadtree  partitioning  is  that  no  attempt  is  made  to  select  the  domain 
pool  D  in  a  content-dependent  manner.  The  selected  collection  must  be  very  large  to 
enable  a  good  fit  to  the  given  range.  A  technique  to  overcome  this  deficiency  whilst 
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simultaneously  increasing  the  flexibility  of  flie  range  partition  is  to  employ  HV- 
partitioning.  hi  an  HV-partition  a  rectangular  image  is  recursively  partitioned  either 
horizontally  or  vertically  to  form  two  new  rectangles  until  a  covering  tolerance  is 
satisfied  as  in  quadtree  partitioning.  This  technique  is  much  more  flexible  since  the 
position  of  the  partition  is  variable,  thereby  allowing  the  partitions  to  share  some  of 
the  self-similar  structure.  For  example,  partitions  can  be  arranged  so  that  edges  in  the 
image  will  run  diagonally  through  them.  It  is  then  possible  to  use  the  larger  partitions 
to  cover  the  smaller  ones  with  the  expectation  of  obtaining  a  good  cover.  For  a  more 
detailed  description  and  variation  of  this  technique  the  reader  is  referred  to  Chapter  6 
of  Ref.  [38]. 

When  encoding  with  HV-partitions,  the  same  two  basic  steps  required  are  as  for  the 
quadtree  method.  These  are; 

(1)  recursive  partitioning  to  establish  nonoverlapping  ranges,  and 

(2)  domain  searching  to  determine  the  domain  that  will  map  onto  a  particular  range. 

Each  pixel  in  the  original  image  is  assigned  to  exactly  one  range  through  partitioning, 
but  it  can  appear  in  multiple  domains,  which  are  typically  two  to  three  times  greater 
than  the  ranges.  As  before,  the  affine  transformation  of  the  pixel  values  must  be 
contractive. 

In  HV-partitioning,  however,  the  average  pixel  value  for  each  row  and  column  of 
pixels,  of  the  particular  range  imdergoing  partitioning,  is  calculated.  These  averages 
are  used  to  compute  successive  differences  between  the  averages.  Then  a  linear  biasing 
function  is  applied  to  each  of  these  differences,  which  multiplies  them  by  their 
distance  from  &e  nearest  side  of  the  rectangular  range.  That  is,  if  flie  range  contains 
pixel  values  j  for  0<i  <N  and  0  <  j  <M,  then  the  horizontal  sums. 


and  vertical  sums. 


J 


are  computed,  subtracted  and  midtipHed  by  the  biases, 

min(/,M-;-l)/(M-l)  and  min(f,N-i-l)/(iV-l), 

respectively.  The  first  partition  is  found  by  determining  the  maximmn  value  of  aU  the 
biased  horizontal  and  vertical  differences  so  that  it  is  either  located  at  horizontal 
position  j  or  at  vertical  position  i,  depending  on  which  yields  the  larger  biased 
difference.  This  pelds  two  rectangles  which  tend  to  partition  the  given  range  along  the 
strong  vertical  or  horizontal  edges  while  avoiding  narrow  rectangular  partitions. 

The  domain  search  is  almost  the  same  as  the  quadtree  method.  Once  a  rectangle  is 
divided,  a  domain  is  sought  for  the  largest  currently  uncovered  range.  Unlike  the 
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quadtree  method,  it  is  not  the  rms  difference  but  the  square  difference  of  the  pixel 
values  that  is  compared  with  a  predetermined  threshold  to  determine  when 
partitioning  takes  place.  If  the  square  difference  is  smaller  than  the  direshold,  then  the 
transformation  is  accepted,  otherwise  the  range  is  partitioned  into  two  new  ranges. 

The  encoding  time  can  also  be  accelerated  by  employing  one  of  the  following  meliiods; 

(1)  quadrant  classification, 

(2)  encoding  by  range  size,  or 

(3)  domain-range  ratio  restriction. 

Decoding  can  be  performed  by  a  more  efficient  method  than  the  standard  method  of 
iteration  to  a  fixed  point.  This  more  efficient  method  involves  pixel  referencing  and 
low-dimensional  fixed  point  approximation.  For  more  details  of  the  above  methods, 
the  reader  is  referred  to  the  article  by  Fisher  and  Menlove  in  Oiapter  6  of  Ref.  [38]. 

The  Fisher  and  Menlove  technique  can  be  applied  at  various  optimisation  levels 
ranging  from  0  to  8  with  each  level  resulting  in  a  decrease  in  the  number  and  type  of 
comparisons  to  be  performed.  At  high  optimisation  levels  very  rapid  compression  is 
achieved  yielding  high  compression  ratios,  but  not  very  good  fidelity.  Specifically, 
Fisher  and  Menlove  obtained  the  following  results; 


Table  1  Results  obtained  by  Fisher  and  Menlove 


Level 

Compression 

Ratio 

PSNR  (dB) 

Encoding 

Time  (secs) 

5 

14.7 

34.62 

979.9 

2 

39.4 

30.99 

1122.1 

6 

80.2 

28.15 

42.5 

By  inspecting  the  reproduced  images  they  were  able  to  rule  out  the  final  case  as  an 
acceptable  level.  In  addition,  they  found  that  the  relationship  between  the  PSNR, 
compression  ratio  and  encoding  time  is  linear  on  a  log  scale. 

Popescu  and  Yan  [44]  have  also  developed  an  adaptive  block  splitting  scheme  which  is 
more  flexible  than  a  quadtree  method.  The  image  is  split  into  blocks  according  to  a  tree 
structure  with  the  base  or  roots  consisting  of  the  initial  24x24  block  splits  of  the  image. 
Branches  are  formed  by  two  partitioning  attempts;  tiie  first  creates  nine  8x8  blocks  and 
the  second  four  12x12  blocks.  On  the  first  branch  each  8X8  block  is  searched  for  a 
match  in  the  pool  of  domain  blocks.  If  none  is  found,  then  further  branches  are  created 
by  splitting  the  8x8  block  into  four  4x4  blocks.  On  the  second  branch  each  12X12  block 
is  searched  for  a  match  in  the  pool  of  domain  blocks  and  if  none  is  formd,  then  the  next 
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level  is  investigated  which  consists  of  four  6x6  blocks,  one  8x8  and  five  4X4  blocks  and 
nine  4x4  blocks.  The  path  producing  the  shortest  code  is  eventually  selected.  Popescu 
and  Yan  state  that  dus  splitting  strategy  produces  optimal  results  compared  with 
quadtree  partitioning  and  have  applied  their  method  to  a  colour  image  of  a  fish 
achieving  a  high  compression  rate  of  31.44  and  a  PSNR  of  36.14,  but  the  encoding  time 
is  not  given. 


6.4  The  Bath  Fractal  Transform 

Monro  and  Dudbridge  [45]  have  developed  the  Bath  Fractal  Transform  (BFT)  method 
of  encoding  rectan^ar  grey-scale  image  blocks  which  eliminates  the  need  for 
searching.  Zero  searching  fractal  transforms  are  particularly  important  because  both 
the  coding  and  decoding  speeds  are  fast.  The  image  is  tiled  with  reduced  copies  of 
itself  using  a  least-squares  approximation  to  derive  an  optimal  mapping  or  set  of  affine 
transformations  known  as  a  Self  Affine  System  (SAS).  The  approximation  to  a 
rectangularly  tiled  block  is  found  by  evaluating  various  low  order  moments  over  the 
block  and  solving  a  set  of  four  hnear  equations  for  each  tile.  The  method  is  easy  to 
implement  and  is  feasible  for  real-time  applications.  A  brief  description  of  the  method 
appears  below  while  more  extensive  details  can  be  obtained  from  Refs.  [45-47]. 

To  encode  an  image,  an  Iterated  Ftmction  System  (IFS)  of  order  N  in  is  foimd.  This 
is  the  SAS,  which  is  defined  as. 


where. 


W={w,  :k  =  l,...,N}, 


(air 


(4) 


The  IFS  is  defined  arbitrarily  with  a  rectangular  attractor  and  is  called  the  domain  part 
of  the  BFT.  The  attractor  A  could  be,  for  example,  any  non-overlapping  tiling  of  the 
image.  For  each  k  a  fractal  function /(a:,y)  known  as  the  function  part  of  Ae  transform, 
is  defined  on  A  which  approximates  the  grey  scale  g(x,y)  of  tile  k.  The  fractal  function 
is  specified  by  a  recursive  set  of  mappings, 

/(w*  =  V*  (x,y,f{x,y)).  (5) 

Contracting  an  image  fragment  f(x,y)  onto  tire  image  introduces  self-similarity  and 
hence,  the  process  can  be  viewed  as  fractal.  The  mappings  form  a  collection  of 
functions  which,  when  iterated  or  rendered,  form  an  approximation  to  the  image 
according  to  the  Collage  Theorem  (see  Appendix  B).  The  BFT  finds  the  least  squares 
mapping  and  %  can  be  any  function  that  is  contractive  of  f.  When  %  is  a  polynomial  in 
(x,y),  the  BFT  involves  evaluating  low  order  moments  over  the  image  blocks  and  the 
solution  of  small  sets  of  linear  equations.  Grey  scale  mappings  can  be  represented  as 
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n  {x,y,f)  =  a+b,x  +  b^y  +  c,  y'" 

(6) 

+d^x^  +dyy‘^  +efix,y). 

A  zero  order  fractal  is  one  where  all  its  coefficients  except  for  a  and  e  are  equal  to  zero. 
The  first  order  terms  in  x  and  y  are  referred  to  as  a  bilinear  fractal  transform  while  die 
second  order  terms  are  referred  to  as  biquadratic. 

By  minimising  the  Collage  Theorem  with  respect  to  the  coefficients,  a  fractal  least 
squares  approximation  can  be  obtained  to  a  given  function  g{x,y).  That  is,  by  using  the 
rms  metric  (Eq.  (1)),  one  minimises  for  each  k  by  taking  partial  derivatives  of, 

L » {x, y))-a-b^x-byy-e g(x, y)]  dL ,  (7) 

and  setting  them  equal  to  zero  to  obtain  a  solution  for  a,  hx,  by  and  e. 

Srnprisingly,  Monro  and  WooUey  [47]  have  foxmd  that  to  obtain  high  fidelity  or  low 
compression  images,  it  is  better  to  employ  higher  order  methods  while  low  order 
fractal  transform  methods  are  better  for  low  fidelity  images.  This  means  that  the  type 
of  fractal  transform  method  one  employs  is  dependent  upon  the  application.  For 
example,  if  the  p-UAV  were  to  be  employed  in  surveillance  missions  involving  large 
expanses  of  water,  then  low  order  fractal  transform  methods  may  be  suitable,  whereas 
for  reconnaissance  missions  in  dense  vegetation,  the  opposite  would  seem  to  apply. 

Moiuo  and  Dudbridge  [48]  have  also  developed  the  Accurate  Fractal  Rendering 
Algorithm  (AFRA)  that  enables  fast  decoding  of  video  streams.  The  algorithm 
overcomes  problems  associated  with  traditional  methods  which  attempt  to  construct 
an  exact  fractal  when  only  a  representative/  finite  set  of  pixels  is  required  on  a  graphics 
screen  of  finite  resolution.  Determining  this  finite  set  is  called  rendering  of  the  fractal. 
Monro  and  Dudbridge  introduce  the  non-iterative  Minimal  Plotting  Algorithm  (MPA) 
to  show  how  deterministic  fractals  can  be  rendered  by  generating  a  pixel  set  that 
approximates  the  minimum  cover  of  an  attractor.  Compared  with  the  Random 
Iteration  Algorithm  (RIA),  the  MPA  is  able  to  plot  101,258  points  of  the  particular 
fractal  known  as  the  Sierpinski  triangle  using  303,774  transformations  whereas  only 
84%  of  the  MPA  points  are  plotted  with  the  RIA  after  303,774  iterations.  The  AFRA  is 
an  adaptation  of  the  MPA,  which  renders  fractal  functions  as  given  by  Equations  (5) 
and  (6).  That  is,  it  approximates  grey  scale  images  by  a  simple  extension  of  the  MPA. 
Monro  and  Ehidbridge  conclude  that  the  MPA  and  AFRA  display  IFS  fractals  at  any 
desired  resolution  in  very  few  computations  per  pixel  and  that  they  help  overcome  a 
major  barrier  to  the  application  of  fractal  technology  by  supporting  real-time 
performance  video  compression/ decompression.  This  is  demonstrated  in  Ref.  [48] 
where  the  BFT  and  AFRA  are  combined  to  produce  a  real-time  fractal  video 
compression  scheme  with  bit  rates  as  low  as  40  kbits/ s  while  still  displaying  images  at 
25  frames  per  second. 
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6.5  Parallel  Processing 

An  alternative  approach  of  avoiding  the  intense  computations  required  in  the  fractal 
encoding  of  images  is  the  massively  parallel  implementation  scheme  developed  by 
Xue  et  al  [49]  for  a  multi-SIMD  quad  pjnramid  machine.  Typically,  fractal  encoding 
complexity  is  0(n4)  for  an  nxn  image,  which  prohibits  real-time  application.  The 
scheme  in  Ref.  [49],  however,  reduces  the  complexity  to  0{n^),  the  same  order  as  the 
decoding  complexity.  Using  a  256X256  image,  Xue  et  al  achieved  die  following  results 
on  a  pyramid  machine  with  a  base  of  mxm  processors; 


Table  2  Results  achieved  by  Xue  et  al 


m 

B 

Xp  (s) 

256 

8 

32 

16 

87 

128 

8 

128 

16 

350 

64 

8 

513 

16 

1399 

In  the  table,  B  represents  the  number  of  bits  in  the  data  and  tp  is  the  parallel  computing 
time.  The  scheme  can  be  implemented  on  smaller  machines,  but  as  can  be  seen  from 
the  table,  the  computing  times  increase. 


6.6  Fractals  and  the  DCT 

Sloan  [50]  has  carried  out  two  studies  to  compare  fractal  transform  coding  witii  JPEG 
implementations  of  DCT  compression.  The  first  study  involved  images  with  a 
resolution  of  640X400  at  24  bits/pixel  while  the  second  involved  images  with  a 
resolution  of  1024X1024  at  16  bits/pixel.  Image  fidelity  was  measured  by  comparing 
the  rms  difference  between  the  original  digital  image  and  the  compressed  image  and 
then  again  using  the  decompressed  image.  In  the  first  study  compressed  file  sizes 
ranged  from  5  to  50  K.  It  was  foimd  that  for  the  larger  file  sizes  both  coding  techniques 
yielded  similar  results,  but  for  the  lower  file  sizes,  fractal  transform  coding  yielded 
significantly  better  quality  imagery  than  tiie  DCT.  In  the  second  study  compressed  file 
sizes  ranged  from  6  to  90  kbytes.  It  was  found  that  the  DCT  broke  down  for 
compression  ratios  greater  than  200,  while  this  only  occurred  for  compression  ratios 
well  in  excess  of  800  for  fractal  transform  coding.  For  aU  compression  ratios  considered 
in  the  second  study,  it  was  foimd  that  the  rms  error  differences  for  the  DCT  were 
always  higher,  i.e.  of  lower  fidelity,  than  the  corresponding  images  from  fractal 
compression.  This  was  borne  out  by  visual  inspection  of  the  resulting  images  in  which 
the  JPEG  images  displayed  block  artifacts  as  the  limit  of  the  JPEG  technique  was 
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approached.  Thus,  Sloan  concludes  that  the  fractal  transform  coding  permits  much 
srnaller  file  sizes  to  be  attained. 

A  novel  idea  put  forward  by  Zhao  and  Yuan  [51]  is  to  combine  fractal  transform 
coding  with  the  DCT.  They  have  claimed  that  although  fractal  transform  coding  can 
achieve  high  compression,  the  quality  of  the  decompressed  image  is  not  good.  Their 
new  method  partitions  the  original  image  into  8X8  range  blocks  and  16X16  domain 
blocks  denoted  by  Fg(u,v)  and  Fo(u,v)  respectively.  After  these  are  transformed  by 
the  DCT,  the  range  blocks  are  classified  according  to  their  AC  coefficients  into  simple 
and  complicated  range  blocks.  A  simple  range  block  is  approximated  by  storing  its  DC 
coefficient  Fj^(0,0)  which  only  requires  10  bits.  A  complicated  range  block  is 

approximated  by. 


Fj^  (u,  v)  =  TO  <p(F^  (u,  v)) ,  (8) 

where,  cp  is  a  contractivity  operator  mapping  domain  blocks  onto  range  blocks  and  t  is 
a  compoimd  transformation  consisting  of  one  of  eight  possible  isometries  that  have 
been  modified  into  DCT  forms,  a  scaling  and  a  luminance  shift.  The  method  then 
searches  for  the  best  matching  domain  block.  Altogether  27  bits  are  required  to 
approximate  a  complicated  range  block,  which  include  10  for  the  coordinates  of  the 
best  matching  block,  3  for  the  scaling  factor,  3  for  the  isometry  and  11  for  the 
luminance  shift.  Zhao  and  Yuan  have  applied  their  method  to  the  Lena  image 
achieving  a  compression  ratio  of  only  12.4  but  a  very  good  SNR  of  32.3  dB.  However, 
there  is  no  mention  of  encoding  and  decoding  times. 

Finally,  a  new  technique  [52]  is  currently  being  developed  that  incorporates  the  high 
compression  capabilities  of  fractal  transforms  into  a  DCT  based  compression  algorithm 
as  recommended  by  JPEG  and  MPEG.  The  idea  is  to  extract  the  high  frequency 
information  or  feattires  of  an  input  image  from  the  low  frequency  information.  The 
extracted  features  are  to  be  encoded  into  fractals  while  the  remainder  of  the  image  is  to 
be  compressed  by  a  combination  of  DCT,  VQ  and  entropy  coding  following  die  MPEG 
specification.  Since  the  features  of  an  image  will  undergo  fractal  transform  coding,  it 
mean  a  reduction  in  the  nxunber  of  required  computations  and  thus,  this 
technique  offers  the  potential  of  achieving  high  compression  rates  and  fast  processing. 


7.  Wavelet  Transform  Coding 

The  term  wavelet  was  introduced  at  the  beginning  of  the  1980s  by  a  French 
geophysicist,  J.  Morlet  [53,54].  It  denotes  a  univariate  function  v)/€R,  which,  when 
subjected  to  the  fundamental  operations  of  integer  shifts  and  dyadic  dilations  yields  an 
orthogonal  basis  of  L2(r).  Such  a  function  is  called  an  orthogonal  wavelet  which  can  be 
applied  to  a  finite  group  of  data.  Functionally,  it  is  very  much  like  the  Discrete  Fourier 
Transform  where  tiie  input  signal  is  assumed  to  be  a  set  of  discrete  time  samples.  A 
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wavelet  can  be  viewed  as  a  bump  which  can  be  squeezed  or  expanded  by  a  dilation 
and  shifted  by  a  translation.  Wavelet  coefficients  can  be  efficiently  computed  and 
fimctions  reconstructed  from  these  coefficients  using  algorithms  known  as  &e  wavelet 
transforms  [54].  For  a  mathematical  description  the  reader  is  referred  to  Appendix  C. 


7.1  Image  Compression  using  Wavelets 

Wavelet  techniques  have  attracted  much  interest  over  the  last  few  years  because  they 
not  only  eliminate  the  distortion  that  arises  from  data  blocking,  they  also  bring  about  a 
reduction  in  the  block  artefacts  associated  with  Fourier  based  spectral  methods  such  as 
the  DCT.  Furthermore,  they  can  be  employed  to  take  advantage  of  the  piecewise 
polynomial  nature  of  real  world  images  [55].  In  essence,  wavelet  techniques  are  able  to 
condense  a  large  percentage  of  the  total  image  into  low  frequency  terms  and  can  be 
used  to  approximate  functions  with  Httle  smoothness,  a  particularly  useful  feature 
with  regard  to  image  compression  [54]. 


7.2  Implementation 

The  major  deficiency  of  wavelet  reconstruction  is  that  the  deepest  nested  dilation  from 
decomposition  must  be  the  first  to  be  reconstructed.  This  means  that  transformed  data 
must  be  saved  in  memory  so  that  the  output  appears  in  the  reverse  order  in  which  it  is 
calculated.  Thus,  the  size  of  the  input  blocks  and  resolution  in  wavelet  decomposition 
are  limited  by  available  memory.  In  fact,  most  of  the  effort  in  wavelet  transform  coding 
is  in  scheduling  the  filters  and  managing  the  input  and  output. 

Hoag  and  Ingle  [56]  used  the  pyramid  approach  to  wavelet  data  compression  witii 
vector  quantisation  as  opposed  to  die  commonly  used  scalar  quantisation  in  which 
only  the  most  significant  bits  of  the  wavelet  coefficients  are  kept.  Their  aim  was  to 
compress  xmderwater  video  data  onboard  an  Autonomous  Underwater  Vehicle  or 
AUV  to  enable  it  to  be  transmitted  acoustically  to  a  remote  site.  To  support  this 
application,  the  data  must  be  massively  compressed.  They  used  a  256x256  test  image 
from  a  clip  of  underwater  video  taken  of  the  Titanic.  The  test  image  exhibited  low 
contrast  and  detail  inherent  in  imderwater  imagery.  The  best  results  were  obtained 
with  a  5-step  wavelet  decomposition  in  which  die  higher  subbands  coefficients  were 
set  to  zero.  The  reconstruction  of  the  Titanic  image  yielded  a  PSNR  of  31.7  dB  and  the 
quality  was  excellent.  Hoag  and  Ingle  then  made  a  comparison  with  the  JPEG  DCT 
Algorithm.  They  found  that  in  the  low  bit-rate  (high  compression)  range  between  0.1 
and  0.2  bits/ sample,  the  quality  of  the  JPEG  reconstructed  images  dropped  off 
dramatically  due  to  the  inherent  bloddness  distortion  caused  by  zeroing  too  many  of 
the  high  frequency  DCT  coefficients.  The  Wavelet/ VQ  approach  achieved  much  better 
PSNR  results  at  the  low-bit  rates  while  better  quality  reconstructed  images  were 
obtained  with  the  JPEG  algoridim  for  bit  rates  greater  dian  0.2  bpp.  In  particular,  Hoag 
and  Ingle  found  that  at  0.16  bpp,  or  a  compression  ratio  of  50:1,  the  Wavelet/VQ 
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approach  produced  a  degraded  image  due  to  blurring  that  remained  intelligible 
whereas  the  JPEG  image  was  distorted  beyond  recognition. 

Zetder  et  al  [55]  were  able  to  compress  Lem  images  to  ratios  of  100:1  (0.08  bpp)  and 
50:1  (0.16  bpp)  using  wavelet  transform  coding.  At  100:1  the  decompressed  image  was 
distorted.  Despite  die  high  noise  level,  however,  the  features  of  the  first  reconstructed 
image  were  preserved  as  well  as  edges  and  general  shapes.  The  distortion  was 
restricted  primarily  to  textures.  For  example,  a  halo  effect  was  produced  in  the  region 
immediately  surrounding  the  image.  Zetder  et  al  also  claim  that  advanced  techniques 
can  be  employed  to  reduce  the  apparent  distortion  in  images.  The  second 
decompressed  image  possessed  considerably  more  fidelity.  Zettier  et  al  also  concluded 
that  with  a  custom  chip  implementation,  entire  multiplication  lookup  tables  can  be 
pre-loaded  so  that  performance  can  be  markedly  improved  by  reducing  the  time 
required  to  carry  out  computation,  which  is  necessary  if  a  real-time  capability  is  to  be 
achieved  with  discrete  wavelet  transform  (DWT)  coding.  It  should  also  be  mentioned 
that  Zhang  Ye  et  al  [57]  have  also  employed  VQ  in  conjimction  with  DWT  coding  on 
256X256X8  bit  images  and  have  obtained  SNRs  of  26.04  and  23.06  dB  at  coding  rates  of 
0.78  and  0.70  bpp  respectively. 

In  an  interesting  approach  Rinaldo  and  Calvagno  [58]  have  combined  fractal  transform 
coding  with  wavelet  transform  coding.  First,  die  original  images  tmdergo  wavelet 
decomposition  whereupon  each  subimage  is  divided  into  range  blocks.  The  range 
blocks  are  then  matched  with  domain  blocks  chosen  in  the  four  lowest  resolution 
subimages  and  coded  through  a  description  of  the  map  that  transforms  the  domain 
block  into  a  range  block.  Rather  than  recursively  coding  range  blocks  from  the  blocks 
in  the  image,  Rinaldo  and  Calvagno  predict  the  range  blocks  of  the  subimage  from  the 
blocks  of  low  resolution  images  which  they  claim  simplifies  die  decoding  procedure 
considerably  and  allows  a  more  accurate  control  of  the  reconstruction  error.  This 
image  decomposition  technique  acts  as  an  automatic  classifier  of  blocks,  thereby 
reducing  the  block  searching  time  and  jdelding  smaller  mean  squared  errors.  As  a 
consequence,  Rinaldo  and  Calvagno  state  that  their  Wavelet-Fractal  Coder  (WFC) 
provides  an  improvement  in  both  the  compression  rate  and  computational  time. 

Rinaldo  and  Calvagno  applied  their  WFC  to  a  512x512X8  grey-level  image  of  the  Lena 
image.  First,  they  present  the  original  and  reconstructed  image  at  0.25  bpp  (32:1).  The 
visual  quality  of  the  reconstructed  image  was  fairly  good,  but  some  artefacts  and 
ringing  effects  were  noticeable  near  the  edges.  They  also  found  tiiat  the  WFC 
performed  better  than  JPEG  coding  over  an  entire  range  of  bit  rates  yielding  an 
improvement  in  PSNR  that  was  almost  independent  of  bit  rate.  The  total  coding  time 
for  the  image  was  about  2  mins  on  a  Sun  SPARC  workstation  with  similar  times 
involved  for  other  images.  Thus,  the  coding  time  is  slightiy  longer  than  that  by  JPEG, 
which  brings  into  question  its  suitability  for  real-time  applications  at  present. 
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8.  Discussion 


So  far,  we  have  reviewed  the  airrent  state  of  the  art  on  three  of  die  most  prominent 
image  compression  techniques,  but  have  not  made  any  comparison  between  them 
which  we  aim  to  carry  out  in  this  section.  Our  comparison  will  be  hampered  by  the 
shortage  of  literature  directly  comparing  the  techniques  and  the  fact  that  we  have  been 
xmable  to  analyse  existing  software  employing  the  techniques. 

Before  proceeding  any  further  we  shall  be  required  to  give  an  indication  of  what  we 
consider  to  be  an  acceptable  image  quality.  Although  oiu  choice  may  be  open  to 
debate,  we  are  going  to  adopt  the  rule  of  thumb  that  an  image  compression  algorithm 
yielding  a  PSNR  of  more  than  30  dB  is  acceptable. 

Fisher  et  al  [59]  have  made  a  comparison  of  fractal  transform  coding  with  the  EPIC 
(Efficient  Pyramid  Image  Coder)  wavelet  compression  routine  and  JPEG  compression. 
Their  results  are  preliminary  since  the  encoding  time  was  not  considered  as  a  factor 
and  that  none  of  the  codes  had  been  adequately  optimised.  Because  the  degree  of 
optimisation  varies  greatly  for  each  technique,  it  may  overshadow  the  strength  of  a 
particular  compression  technique.  When  encoding  in  fractals  Fisher  et  al  considered 
both  quadtree  and  HV-rectangular  partitioning  approaches. 

The  images  used  in  their  study  were  512X512X8  bit  versions  of  Lena  and  the  Boat 
benchmark  image.  The  results  are  markedly  different  for  PSNRs  greater  than  30  dB 
compared  with  those  lower  than  30  dB.  For  lower  PSNRs  JPEG  compression  yields  a 
much  lower  compression  ratio  than  the  other  approaches.  For  PSNRs  greater  than  30 
dB,  fractal  transform  coding  employing  quadtree  partitioning  yields  the  lowest 
compression  ratio.  For  PSNRs  greater  than  35  the  three  remaining  techniques  peld 
almost  identical  compression  ratios  while  for  PSNRs  between  30  and  35  there  is  a 
noticeable  difference  with  HV  fractal  encoding  offering  the  highest  compression  ratios. 
Specifically,  for  a  PSNR  of  30  dB  JPEG  coding  offers  a  compression  ratio  of  about  35:1, 
whereas  the  EPIC  wavelet  software  and  the  fractal  encoder  with  HV-rectangular 
partitioning  offer  ratios  of  about  45:1  and  55:1  respectively. 

Fisher  et  al  also  present  decoded  images  of  the  Boat  benchmark  image.  They  present 
the  original  image  first  and  then  give  the  JPEG  coded  version  at  a  compression  ratio  of 
54:1  (0.147  bpp).  Here  the  PSNR  is  23.7  dB  and  it  is  quite  clear  that  many  of  the 
distinctive  features  in  the  original  image  such  as  the  lighthouse  and  parts  of  adjacent 
boats  have  become  severely  degraded.  The  EPIC  Wavelet  version  is  presented  at  a 
compression  ratio  of  58:1  (0.138  bpp)  and  a  PSNR  of  26.4  dB.  The  image  is  a  much 
better  quality  image  than  the  JPEG  image,  but  is  not  as  fine  as  tiie  fractal  version, 
which  has  a  compression  ratio  of  58.1:1  and  a  PSNR  of  27.2  dB.  Although  the  fractal 
version  is  the  best  of  the  images,  very  fine  detail  such  as  the  boaT s  name  is  not  as 
conspicuous  as  in  the  original  image. 
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We  have  already  mentioned  that  the  reception  of  320X200X8  images  from  tiie  p-UAV 
at  a  rate  of  1  frame/ s  requires  a  transmission  rate  of  about  0.6  Mbits/ s.  For 
transmission  in  the  VHF  and  higher  frequency  bands  this  does  not  present  a  problem, 
but  unfortunately,  it  does  mean  that  the  p-UAV  system  can  only  be  deployed  in  LOS 
operations.  For  non-LOS  operations  trans-mission  is  possible  in  the  HF  range  provided 
permission  can  be  obtained  to  combine  four  pre-allocated  bandwidths  as  described  in 
Section  3,  thereby  extending  the  bandwidth  to  10  kbits.  Then  the  data  need  only  be 
compressed  by  a  factor  of  64.  We  have  seen  that  this  compression  ratio  is  almost 
achievable  with  fractal  compression,  but  not  with  wavelet  and  DCT  encoding.  Of  the 
remaining  two  techniques  the  EPIC  wavelet  routine  offers  significantly  better 
compression  ratios.  However,  if  the  frame  rate  were  reduced  to  0.5  Hz  or  slightly 
lower,  tiien  it  would  be  possible  to  transmit  images  by  employing  DCT  coding.  In 
addition,  although  a  compression  ratio  of  64:1  is  almost  achievable  with  fractal 
encoding,  the  problem  with  fractal  encoding  is  whether  the  encoding  can  be 
accomplished  sufficiently  quickly  to  meet  near  real-time  requirements.  Of  course,  this 
is  one  of  the  major  topics  in  fractal  transform  coding  currently  under  investigation  as 
discussed  in  Section  6. 

From  our  discussion  of  the  three  data  compression  techniques  it  can  be  seen  that  the 
DCT  is  the  technique  offering  he  closest  to  a  near  realtime  capability.  Rinaldo  and 
Calvagno  [58]  mention  that  their  WFC  takes  slightly  longer  than  the  DCT  while  fractal 
encoding  takes  significantly  longer  than  the  other  two  methods.  Furthermore,  to 
achieve  higher  compression  ratios  more  processing  time  is  required.  Thus,  parallel 
implementation,  software  optimisation  and  improvements  in  processor  technology  are 
still  required  before  the  other  techniques  will  be  able  to  match  the  current  processing 
speed  of  the  DCT. 

Transmission  of  640X400X24  (VGA  quality)  images  at  a  TV  frame  rate  of  25  Hz  as 
described  tmder  naval  operations  in  Section  3  requires  a  transmission  rate  of  at  least 
144  Mbits/s,  which  means  that  transmission  can  only  take  place  in  the  VHF  or  UHF 
bands  and  only  after  significant  compression  (greater  than  30:1)  has  been  applied.  For 
these  operations  the  range  of  the  vehicle  is,  therefore,  limited  to  LOS  applications.  In 
addition  to  the  LOS  limitation,  the  range  is  dependent  on  the  signal  strengtii  and  the 
gains  in  the  antennas  for  the  links. 

Because  of  its  compactness,  the  vehicle  would  possess  a  limited  power  and  operate 
with  an  omnidirectional  anterma.  However,  the  GCS  would  be  able  to  have  a  high  gain 
directional  antenna  capable  of  high  transmission  power,  thereby  allowing  the  range  to 
be  extended.  Typical  UAV  data  link  ranges  with  this  arrangement  range  between  40 
and  50  km.  For  the  p-UAV  the  available  power  at  the  GCS  wiU  be  certainly  less  than 
typical  UAVs  and  the  antenna  pointing  accuracy  less  precise.  This  means  that  a  larger 
beam  width  would  be  required  to  monitor  the  p-UAV  resulting  in  a  lower  maximum 
range. 

In  order  to  reduce  the  size  of  the  bandwidth  further  after  compression  has  been 
applied  to  the  VGA  images,  the  only  remaining  option  is  to  reduce  the  frame  rate  since 
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we  have  seen  that  compression  ratios  at  best  range  from  30:1  for  JPEG  compression  to 
55:1  for  fractal  encoding  for  PSNR  values  of  30  dB.  Although  this  means  transmission 
would  not  appear  to  be  continuous  to  operator  at  the  GCS,  operating  with  a  smaller 
bandwidth  opens  up  the  following  possibilities; 

(a)  Transmission  at  die  original  frequency  and  power  leads  to  better  Signal  to  Noise 
Ratios,  or  in  the  case  of  digital  data  reduced  Bit  Error  Rates  (BER).  This 
improvement  can  be  used  to  extend  die  range  or  provide  better  immunity  to 
external  interference. 

(b)  The  transmission  power  may  be  reduced  by  trading  the  improved  SNR/ BER 
against  transmission  power. 

(c)  The  improved  SNR/BER  could  also  be  traded  for  wider  antenna  beam  width 
(thereby  reducing  antenna  pointing  requirements)  or  longer  range. 


9.  Conclusion 


In  summary,  the  downlink  for  the  p-UAV  system  in  the  short  term  could  be  based  on 
DCT  compression  operating  in  the  VKF/UHF  bands  and  would  have  an  LOS  range  of 
about  30  km  using  a  directional  antenna.  For  the  naval  operations  mentioned  in 
Section  3,  however,  transmission  of  VGA  quality  images  would  almost  certainly 
require  a  reduced  frame  rate  from  the  TV  rate  of  25  Hz.  For  land  based  operations 
where  the  resolution  is  not  as  critical  and  hence,  can  be  reduced  significantly,  it  would 
be  possible  to  transmit  images  at  the  TV  frame  rate.  Alternatively,  less  compressed  or 
better  quality  images  could  be  transmitted  at  the  reduced  frame  rate. 

A  longer  term  aim  would  be  to  employ  fractal  and/or  wavelet  transform  coding 
techniques  to  carry  out  the  image  compression.  With  continuing  research  into  these 
data  compression  techniques  and  further  advances  in  microprocessor  technology, 
there  is  more  than  a  real  possibihty  that  these  techniques  will  offer  a  near  realtime 
capability  with  compression  ratios  significantly  higher  than  DCT  compression  in  the 
not  too  distant  future. 
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Appendix  A 

Mathematical  Details  of  the  Discrete  Cosine 

Transform 


As  mentioned  in  Section  5,  the  DCT's  success  as  an  image  compression  technique  lies 
in  its  ability  to  eliminate  the  less  visually  stimulating  high  frequency  components  of  a 
signal  and  to  retain  the  quantised  values  of  the  low-frequency  Fourier  coefficients.  In 
this  appendix  we  present  the  mathematical  details  that  are  necessary  for 
tmderstanding  how  an  algorithm  based  on  the  DCT  can  be  developed. 

Signals  are  defined  in  terms  of  discrete  values  of  the  independent  time  variable  and  are 
represented  mathematically  as  sequences  of  numbers.  A  discrete-time  system  is 
essentially  an  algorithm  for  converting  one  of  these  sequences  (an  input)  to  another 
(an  output)  [60].  If  x(n)  represents  an  input  sequence  and  y(n)  an  output  sequence,  then 
the  response  h(n)  of  a  system  to  a  digital  impulse  is  defined  as. 


00 

y(n)=  ^  h(m)x(n-m).  (Al) 

mss -00 

Introducing  x(n)=exp(iom),  where  co  is  die  frequency,  into  the  above  equation  yields, 

y{n)  =  3““”  £  h(m)e-“^'"  =  x{n)H{e'‘“  ) ,  (A2) 

m=-<=o 


where,  is  the  Fourier  Series  representation  of  the  impulse  response  [60]. 

The  Discrete  Fourier  Transform  (DFT)  is  obtained  by  considering  a  sequence  x(n)  witii 
period  N  such  that. 


x(«)=  ^X(A:)exp(2;nA:)  ,  (A3) 

A=-oc> 

where  co  =  iTik  IN  are  the  only  possible  frequencies.  Because  of  the  periodicity  x(n)  can 
be  simplified  to. 


N-\ 

x(n)  =  ^  X(k)  exp(2mk) , 
*=o 


(A4) 


while  the  DFT  becomes. 
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N-\ 

X{k)  =  ^  x(n)  expi-lmkn  /  N) .  (A5) 

n=0 

Following  Ref.  [61]  we  consider  the  signal  to  be  a  2N-point  even  extension  of  the 
discrete-time  signal  x(n)  so  that. 


,  .  Uin), 


0<n<A^-l 

N<n<2N-\' 


As  a  consequence,  the  DFT  for  g(n)  can  be  written  as, 

2N-\ 

n=0 


(A6) 


(AT) 


where  =  exp(-/;m^  /  N),  while  its  inverse  known  as  the  IDFT  is  given  by, 

,  2N-1 

■  (A8) 

k=0 

Substituting  Eq.  (A6)  into  Eq.  (AT)  yields  after  some  algebra, 

N-l 

X{k)  =  2W-l^'^Y,x(n)cos 

n=0 

with. 


;r(27i  +  l)fc 
2N  . 


(A9) 


N-\ 

C{k)  =  2^  x(n)  cos 

n=0 


;r(2«  + 1)^ 

2N  J' 


(AlO) 


and  0  <  k  <  2N  - 1 .  Eq.  (A9)  is  known  as  the  One  Dimensional  Discrete  Cosine  Transform 
(ID-DCT)  of  the  discrete  time  signal  and  its  inverse,  tiie  ID-IDCT,  is  given  by 


x(n)  =  — 
^  ^  N 


C(0) 


N-l 


cos 


k=\ 


n{2n  + 1)^ 


(All) 


where  0<n<N-l.Itisasa  result  of  the  even  symmetry  of  the  signal  that  the  cosine 
factors  have  appeared. 
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Image  data  represent  two-dimensional  signals  and  thus  the  preceding  material  must 
be  extended  before  it  can  be  utilised  as  a  data  compression  technique.  The  2D  DCT  is 
obtained  by  defining  the  signal  as  a  (2Nx2N)-point  even  extension  in  which. 
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K«l«2)  =  i 


^(2iVj  W]  1, /I2), 

>;(«,  ,2A^2  -  «2  - 1)’ 


[y{2N,  -1,2A^2  -«2 


0  <  «i  <  A^i  - 1, 

A^]  <  n,  <  2A^,  - 1, 
0  <  «,  <  A/]  - 1, 

-1),  iVi<«,<2A^, -1, 


0<n2  -^2 
0  <  n2  ^^2  “1 

A/j  ^  ”2  -  2-^2  “  1 


By  using  the  above  results  the  2D  Discrete  Fourier  Transform  of  y{n^,n^)cdJ\  be 
expressed  as. 


2N,-12Af2-l 


”2^2 

2N, 


n,=0  n2=0 


while  its  inverse  (the  2-D IDFT)  is  given  by. 


2N^-\2N^-l 

yin„n,)=  £  . 

*1=0  *2=0 

Substituting  yin^  n^ )  into  Eq.  (A8),  one  obtains  after  some  algebra. 


yi^li^2)  ~  ^2N,  ^2ffl  ^'(^5,^2), 


(A12) 


(A13) 


(A14) 


where. 


C{k„k^)  =  AY,Y.  ^("1  ’  "2 ) 

M]=0  n2=0 


;r(2«j  +  1)A:i 

IN, 


coa 


2r(2«2  + 1)^2 

2N, 


(A15) 


and  Q<kj< Nj  -  I,  (J  =  Eq.  (A15)  is  known  as  the  2D  Discrete  Cosine  Transform  (2- 
D  DCT)  of  the  sequence  x(«i,n2)  and  its  inverse,  the  2-D  IDCT,  is  given  by. 


1  N^-\N,-l 

x(n„n2)  =  -—-  -^2]C'(A:,,A:2)cos| 

^1^2  *1=0  *2=0 


2r(2«,  +  1)A:i 

2N, 


cos 


2r(2«2  + 1)^2 


2iV, 


(A16) 


where. 


CXkM 


'C(0,0)I4, 

C(k„0)/4, 

C{0,K)I4 

CikM 


k,=Q,  k2=0 
k,^0,  ^2=0 
^,  =  0,  k2^0 
k,^0,  k2^0 
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In  JPEG  compression  the  pixel  values  from  8x8  blocks  of  the  original  image  are  first 
adjusted  to  centre  them  at  zero.  For  example,  if  pixel  data  are  in  an  8-bit  format,  then 
128  is  subtracted  from  them  so  that  the  signal  can  be  regarded  as  even.  Then  the  DCT 
or  Eq.  (A14)  is  applied  to  the  normalised  pixel  values  so  that  transformed  8-bit  data  are 
stored  as  11  bit  signed  integers.  It  is  these  signed  integers  that  are  quantised  by 
dividing  them  by  a  quantisation  coefficient  and  rounding  off  to  the  nearest  integer. 
The  quantisation  co-efficients  vary  based  on  the  fact  that  quantisation  can  be  much 
more  severe  for  higher  frequencies  than  for  lower  frequencies  because  of  the  the 
human  visual  system's  relative  insensitivity  to  high  frequencies.  In  addition,  because 
JPEG  compression  transforms  8x8  blocks,  there  is  no  need  to  evaluate  the  cosine 
factors  in  Eq.  (A14)  repeatedly.  Instead,  a  look-up  table  can  be  em-ployed.  For  more 
details  regarding  implementation  the  reader  is  referred  to  Refs.  25  and  26. 
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Appendix  B 

Fractal  Transform  Coding 


Here  we  review  the  basic  mathematical  concepts  tmderpinning  the  implementation  of 
deterministic  fractals  in  image  compression.  For  a  more  comprehensive  treatment  of 
the  subject  the  reader  is  referred  to  Refs.  [26,38,62]. 

Barnsley  [26,62]  was  first  to  propose  the  idea  of  fractal  image  compression  in  which 
real-life  images  could  be  modelled  by  deterministic  fractals.  Deterministic  fractals 
represent  the  fixed  points  of  a  set  of  two-dimensional  affine  transformations.  As  a 
consequence,  the  mathematics  of  Iterated  Function  Systems  (IFS)  and  Recurrent  Iterated 
Functions  Systems  (RIFS)  together  with  the  Collage  Theorem  have  been  developed  to 
provide  the  theoretical  foundation  of  fractal  image  compression.  We  shall  describe 
IFSs  here,  although  it  should  be  pointed  out  that  to  encode  images  in  an  automated 
approach,  one  must  use  piecewise  affine  contractive  transformations,  which  make  use 
of  only  the  partial  self-transformability  of  images.  Fisher  [38]  refers  to  these  as 
Partitioned  Iterated  Function  Systems  or  PIFS.  Because  PIFSs  allow  not  only  the  encoding 
of  grey  scale  images,  but  also  partition  an  image  into  pieces  that  can  be  transformed 
separately,  they  are  able  to  encode  many  shapes  that  cannot  be  encoded  by  IFSs. 

The  basic  building  block  of  present  fractal  image  compression  systems  is  the  affine 
transformation  which  for  two  dimensions  is  defined  as  a  mapping 
where  w{x,y)  =  {ax  +  by-\-e,cx-\-dy-¥f)a3\&  a,b,c,d,e,f  e9?.  Such  a 
transformation  can  be  represented  in  matrix  form  as. 


w(x)  = 


b') 

d) 

Ax  +  T 


(Bl) 


If  we  consider  the  one  affine  transformation  of  f(x)  =  ax  +  b,\/xe^  over  the 
interval  [0,1],  then  the  new  interval  length  becomes  |  a  | .  Thus,  the  transformation  / 
rescales  the  interval  by  a  factor  a  while  the  left  endpoint  of  the  interval  is  translated  to 
h.  When  the  affine  transformation  is  said  to  be  contractive.  For  higher 

dimensions  we  require  die  theory  of  metric  spaces  to  define  a  contractive 
transformation. 

Let  (X,d)  denote  the  complete  metric  space  of  digital  images  where  d  is  a  given 
metric  or  distortion  measure  and  let  //g  denote  the  original  image  to  be  encoded.  The 
goal  of  iterated  transformation  theory  is  the  construction  of  a  contractive  image 
transformation,  T,  defined  from  the  space  {X,d)  onto  itself,  for  which  //g  is  an 
approximate  fixed  point.  This  is  known  as  the  Inverse  Problem.  The  transformation,  r , 
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is  referred  to  as  the  fractal  code  for  jUq  while  is  said  to  be  approximately  self- 
transformahle  under  T. 

We  mentioned  above  that  deterministic  fractals  represent  the  fixed  points  of  sets  of 
two-dimensional  affine  transformations.  The  fixed  point  of  a 
transformation/:JSf->  Z  on  a  metric  space  (X,d)  is  the  point  eX  such 

that/ (Xj-)  =  Xj..  A  transformation  fiX^Xona  metric  space  (X,d)  is  lipschitz  if 

there  is  a  constant  s  (known  as  a  Lipschitz  factor)  such  that 
d(f(x),f(yy)<s.d{x,y)  Vx,yeX.  When  5<1,  /  is  said  to  be  contractive  or  a 
contraction  mapping. 

We  are  now  in  a  position  to  give  one  of  the  fundamental  results  of  fractal  image 
compression: 

Theorem  1-The  Contraction  Mapping  Theorem 

Let  f:X  ^  X  be  a  contraction  mapping  on  a  complete  metric  space  iX,d). 
Then  /  possesses  a  unique  point  x^  eX  such  that  for  any  point  x  eX ,  the 

sequence  {/°"  (x) :«  =  0,1,2,...}  converges  tox^,  i.e. 

lim/°'’W  =  ^/  VxeJ^r.  (B2) 

n->oo 

The  point  Xj-  is  called  a  fixed  point  of  the  mapping  / .  The  proof  of  this  theorem  can  be 
foimd  in  Refs.  [38,62].  This  theorem  states  that  the  fixed  point  of  a  transformation  / 
will  be  die  image  one  gets  when  the  sequence  /(Xq),  /(/(xq)),  /(/(/(xq))),...,  is 
computed  for  any  image  Xq  .  That  is,  as  long  as  tiie  transformation  is  contractive  in  die 
space  of  images,  it  will  have  a  unique  fixed  point  that  will  then  be  some  image. 

In  fractal  image  compression  it  is  convenient  to  use  the  Hausdorff  space  H(X)  where 
one  can  study  compact  subsets  of  metric  spaces.  This  means  that  by  using  H(R^ )  one 

can  only  concentrate  on  drawings,  pictures  and  odier  black  on  white  subsets  of  R^.  In 
addition,  when  using  this  space,  another  metric  is  required,  which  is  known  as  the 
Hausdorff  distance  or  metric  h{A,B).  For  a  complete  metric  space  (X,d)  this  metric  for 
points  A  and  B  in  H(X)  is  given  by  A(A,B)  =  cf(A,B) Vi/(B,A)  where  xvy 
represents  the  maximiuii  of  x  and  y.  That  is,  if  A  is  an  element  of  the  associated 
Hausdorff  space  H{X) ,  then, 

.,4^  (£•)  =|x|  x,y)<s  for  somey  , 
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which  means  that  Aj(£)  is  the  set  of  points  of  maximal  distances  from  A.  The 
Hausdorff  distance  between  two  elements  A  and  B  of  H(X)  becomes , 

hj{A,B)=  maxjinf|£-|5c^(£')|,inf{£-|^c5rf(£-)jj. 

An  Iterated  Function  System  consists  of  a  complete  metric  space  (X,d)  together  with  a 
finite  set  of  contraction  mappings  w„:X  X  with  respective  contractivity  factors  Sn 
for  n=l,2,. . .,N.  An  IFS  is  denoted  by  {^;  w„  ,n  =  1,2, . . .  ,N}  with  its  contractivity  factor 
s=max{5„:«  =  1,2,. ..,N}.  IFSs,  or  their  generalisations  mentioned  above,  are  the  basic 
building  blocks  of  fractal  transform  coding.  The  PIFSs  presented  in  Ref.  [38]  possess 
not  only  the  two  spatial  dimensions  of  IFSs  but  also  a  third  dimension  for  the  grey 
levels  of  an  image.  This  means  that  Eq.  (Bl)  has  to  be  modified  to  include  a  z 
component  with  A  now  becoming  a  3x3  matrix  and  T  a  three  dimensional  column 
vector.  The  third  row  and  column  of  A  consist  of  zeros  except  for  the  diagonal 
element  which  consists  of  a  term  to  control  the  contrast  and  the  additional  element 
o,.  in  T  controls  the  brightness  of  die  transformation. 

The  following  theorem  proposes  methods  for  constructing  the  fixed  point  (or  attractor) 
of  an  IFS.  Let  :  w, ,  Wj ,  •  •  • ,  |  be  an  IFS  and  choose  a  compact  set  Aq  c  91  ^ .  Then 

a  sequence  {A„:n  =  0,1,2, •••}  c  /f(9l^)  can  be  constructed.  According  to  this  theorem 

known  as  the  IFS  theorem,  the  sequence  {An}  converges  to  the  attractor  of  the  IFS  in  the 
Hausdorff  metric.  Thus,  we  have  a  procedure  for  calculating  successive 
approximations  to  the  fixed  point  of  an  IFS. 

Theorem  2-The  IFS  Theorem 

Let  =  1,2,...,N}  be  an  IFS  with  contractivity  factor  s.  Then  the 

transformation  W:H{X)  H(X)  defined  by, 

is  a  contraction  mapping  on  the  complete  metric  space  (^H(X),h{d))  with  contractivity 
factor  s.  That  is,  h(W(B),W(C))  <  s.h(B,Q  ,  V5,C  e  H(X) .  Its  unique  fixed  point, 
A  eH{X),  obeys. 


A  =  W(A)=\X_^wM). 


(B3) 


and  is  given  by. 
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^  (5)  yBeH{X). 


This  result  is  proved  in  Ref.  [62].  The  fixed  point  is  called  the  attractor  of  the  IFS  and  as 
a  consequence  of  its  tmiqueness,  we  are  led  to  the  following  theorem: 

Theorem  3-The  Collage  Theorem 

Let  (X,d)  be  a  complete  metric  space.  Let  L  e  H{X)  and  s>0  be  given.  Choose  an  IFS 
with  contractivity  factor  0^<1,  so  that  <s.  Then  h{^L,  A)  <e/ (l-s) 

where  A  is  the  attractor  of  the  IFS. 

This  theorem,  which  is  also  proved  in  Ref.  [62],  states  that  for  a  given  set  or  image  L 
an  IFS,  or  a  set  of  contractive  transformations,  can  be  foimd  for  which  L  is  the  attractor. 
That  is,  the  union  or  collage  of  the  images  of  L  under  the  transformations  is  close  to  or 
looks  like  L.  The  degree  to  which  two  images  look  alike  is  meastired  by  using  the 
Hausdorff  metric  which  in  turn  depends  on  the  metric  d. 

To  summarise  the  above,  let  {X,d)  be  a  complete  metric  space  and  |x  e  H(X)  be  any 
given  image.  Given  a  set  of  contractive  transformations  such  diat  T:H(X)  — >  H(X), 
we  know  from  d\e  Contraction  Mapping  Theorem  that, 

Vpei7(Z).  (B4) 

From  Theorem  2, 

lim  r^"(^)=T(p)=ii,  VpeiT(X). 

Since  an  attractor  is  unique,  we  have  \i=xr  and  must  be  formed  of  transformed  copies 
of  itself.  According  to  the  Collage  Theorem,  minimising  the  distance  between  |i  and 
T(|i)  (the  coUage  of  the  image)  minimises  the  distance  between  the  fixed  point  xj  and  p. 
In  practice,  it  is  not  possible  to  find  a  T  such  that  p=r(p),  but  it  is  possible  to  find  a 
Papprox  satisfying  T(papprox)=papprox.  That  is, 

P  *  Papprox  ~  T'(papprox)  ^  ^  ** 

Now  we  can  say  that  a  fractal  image  is  constructed  from  a  'collage'  of  transformed 
copies  of  itself  and  is  thus  inherently  self-similar. 
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Consider  an  image  as  a  surface  lying  over  a  plane,  defined  by  a  function  f(x,y),  that 
returns  a  number  between  0  and  1  at  each  position  (x,y);  the  range  0  to  1  can  represent 
grey  levels  from  black  to  white  [38].  The  original  image  porig  becomes  a  function 

mapping  the  unit  square  into  the  real  nrunbers,  i.e.  =  [0,l]  x  [0,l]  — >■  91 .  The  image  is 
now  split  into  non-overlapping  domain  blocks  Di  and  range  blocks  JR,  where  the  union 
of  the  domain  blocks  yields  the  original  image,  i.e.  |JD,.  =  .  Furthermore,  we  map 

the  domain  blocks  into  range  blocks  by  a  collection  of  affine  transformations  so  that 
Z),.  ->  r(Z);)  =  Ri .  This  means  that  the  fractal  compression  scheme  has  been  reduced  to 
a  search  through  all  the  range  blocks,  viz.  the  set  of  all  Ri's,  to  find  an  R,  for  each  Di, 
which  minimises  some  measure  of  distortion  or  similarity.  That  is,  we  search  for  the 
part  of  the  image  that  most  looks  like  the  part  of  the  image  in  the  domain  block  [38]. 
Specifically,  we  seek  a  transformation  g  eT  where, 

V  ^,oeX,  3  s<  1,  such  that  d(g(fi),giv))  <  d{n,v)  and  . 

is  as  'close  to  zero'  as  possible.  By  repeated  use  of  the  triangle  inequality,  it  can  be 
shown  for  any  image  //  o  and  any  positive  integer  n  that. 

From  this  result  it  can  be  seen  that,  after  a  niimber  of  iterations,  the  terms  of  any 
iterated  sequence  of  the  form  '  ^l^^re  ju^  is  some  arbitrary  initial 

image,  cluster  around  the  original  image.  In  a  space  of  discrete  images,  the  sequence 
converges  exactly  to  a  stable  image  which  is  its  attractor.  The  closeness  of  g"(/^o) 

is  determined  by  the  measure  or  distortion  d(^„^g,g(Morig))  >  which  is  generally 
taken  to  be  the  root  mean  square  difference  between  image  blocks  as  described  in 
Section  4  and  which  is  known  in  mathematical  terms  as  an  metric. 

A  fractal  code  is  obtained  from  the  search  and  represents  a  statement  such  as  'region  A 
of  an  image  is  most  tike  region  B  after  trairsformation'.  The  original  image  or  rather  the 
pixel  data  are  not  transmitted  to  a  decoder,  only  the  fractal  code.  Thus,  decoding  an 
image  consists  of  repeatedly  applying  the  transformations  in  the  fractal  code  to  an 
arbitrary  initial  image  I,,  tmtil  the  images  converge  to  a  fixed  point  [38]. 
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Wavelets  represent  a  family  of  basis  functions  derived  from  one  single  function 
subjected  to  shifts  and  dilations.  In  this  appendix  we  present  the  basic  properties  of 
wavelets,  which  are  necessary  for  tmderstanding  and  developing  a  wavelet  transform 
coding  scheme  in  image  compression. 

It  is  shown  in  [63]  that  it  is  not  necessary  for  the  set  of  wavelet  functions  given  by  the 
recursive  equation  of  j  :=  2*^V(2*  —  y)  to  form  a  complete  orthonormal  set. 

However,  when  they  do  any  ftmction,  /  elj  (^)  /  can  be  decomposed  into  a  series  of 
the  form. 


j\keZ 


(Cl) 


where (/,g):=  ^fgdx  is  the  usual  inner  product  of  two  L^iS)  functions.  Eq.  (Cl)  can 

be  viewed  as  the  construction  of  /  from  bumps  y/ (functions  with  compact  support) 

with  small  values  of  k  contributing  to  the  broad  resolution  of  /  and  large  values  of  k 
producing  the  finer  detail.  The  decomposition  given  by  Eq.  (Cl)  is  analogous  to  a 
Fourier  decomposition  of  /  in  terms  of  the  exponential  functions  :=  ,  although 

important  differences  exist.  For  example,  aU  terms  in  the  Fourier  series  contribute  to 
the  value  of  /  at  a  point  x  while  wavelets  are  usually  of  compact  support  or  fall  off 
exponentially  at  infinity.  Thus,  the  only  terms  in  Eq.  (Cl)  corresponding  to  y/jj^  with 

7  •  2"*  near  x  yield  a  large  contribution  at  x.  Hence,  the  representation  can  be  regarded 
as  local. 

Multiresolution  analysis  is  a  method  of  creating  an  orthonormal  wavelet  basis  by 
breaking  (i?)  up  into  a  sequence  of  closed  subspaces  Vj  in  the  form  of 


•••CF2  cF,  cFo  cF_,  cF_2  O", 


(C2) 


where  Vm->  {R)  as  m->oo .  These  subspaces  are  subject  to  the  following  properties; 
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(') 

(ii)  Un  =  -oo^n  is  dense  in  L2(Ji)  and  =  {0}, 

(iii) /(x)  eF^  <=>f(2x)  eF^_j , 

(iv) f(x)  sFq  «•  f{x-k)  gVq  Vk  eZ,  and 

(v)  3  geVQ  such  that  g(-A:),  keZ  is  a  Riesz  basis  for  Fq. 

A  Riesz  basis  is  a  set  {a:„}  in  a  Hilbert  space  H  where  an  orthonormal  basis  {e„}  and  a 
bounded  linear  operator  T  are  related  by 


Ten  =  x„.  ,  V  n.  (C3) 

In  addition  to  the  above,  there  exists  the  following  rule  concerning  the  speed  of 
osciUatiorrs, 


/  €  IF  o  f(2)  ms  Z.  (C4) 

So  if  /  is  an  oscillating  fimction  in  ,  then  tire  function  oscillates  twice  as  fast  as  an 
element  of  . 

In  wavelet  theory  it  is  assumed  that  Vo  is  generated  by  the  integer  translates  <|)on(x)=(t)(x- 
n)  of  one  single  fxmction  <{)  known  as  the  father.  Each  /e  Vo  can  be  written  as, 

(^) 

«=-eo 

Since  ())£  Vo  and  Vo  c  V-i  ,  we  have  (j)€V-i  from  Eq.  (C4)  and  <|)(2-i-)^  Thus,  the 
wavelet  basis  is  given  by  the  recursive  difference  equation. 


(f>(2x  -n),  xe  %  (C6) 

n=-oo 


for  some  coefficients  {cn}.  The  range  of  the  summation  in  Eq.  (C6)  is  determined  by  the 
number  of  nonzero  coefficients,  is  arbitrary  and  is  also  referred  to  as  the  order  of  the 
wavelet.  Rearranging  Eq.  (C6)  yields. 


^(x)  =  V2  2^  /?„  ^{2x  -n),  X  e  91,  (C7) 


where  the  factor  of  V2  arises  from  normalisation.  The  numbers  hn=Cn/  Vz  are  called  the 
filter  coefficients  of  (j)  and  obey  the  following  condition. 


YK  =  ^.  (C8) 


50 


DSTO-RR-0087 


Eq.  (C6)  is  orthogonal  to  its  translations,  i.e.  ^(I){x)^{x-k)dx  =  0.  We  also  desire  an 

equation  which  is  orthogonal  to  its  dilations,  i.e.  ^(p{x)(p{2x  —  k)dx  =  O.This  is  the 

associated  wavelet  or  mother  of  the  wavelets  and  is  generated  from  (j)  by  the  following 
equation. 


=  ^Y^gjilx  -  n),  g„  =  (-1)"  h^_„ .  (C9) 

From  this,  other  related  functions  can  be  defined, 

(l>^^{x)  =  2~"''^(l){2~"'x-n)  m,neZ,  (CIO) 

with  the  corresponding  wavelets  given  by, 

=  ^(2~'”x-n)  m,n  sZ  .  (Cll) 

The  system  |  k,nez}  is  also  called  an  orthonormal  wavelet  basis.  In  most  applications 
the  sums  given  above  are  finite  and  we  consider  this  to  be  the  case  from  here  on. 

Eq.  (C6)  can  be  solved  by  contructmg  an  MXM  matrix  of  coefficients,  where  M  is  the 
ntunber  of  nonzero  coefficients.  This  matrix  can  be  designated  by  L  with  entries  Lij=C2i. 
j.  It  always  has  an  eigenvalue  equal  to  unity  and  the  respective  normalised  eigenvector 
consists  of  the  value  of  ^  at  integer  values  of  x.  Once  these  values  are  known,  aU  other 
values  of  <|)  can  be  generated  by  applying  the  recursion  equation  to  get  the  values  at 
half  integers,  quarter  integers  and  so  on  to  the  desired  dilation. 

Plots  of  most  wavelet  functions  appear  to  be  extremely  irregular,  which  is  due  to  the 
fact  that  a  wavelet  fimction  is  non-differentiable  everywhere.  The  functions  that  are 
normally  used  in  transforms  consist  of  a  few  sets  of  well-chosen  coefficients  which 
results  in  a  function  that  has  a  discernible  shape  such  as  the  Haar  basis  function  or  the 
Daubechies-4  wavelet.  The  latter  is  often  used  in  data  compression. 

In  applying  wavelet  theory  to  image  compression,  we  note  that  pixel  values  can  be 
predicted  by  considering  the  complete  image  as  a  histogram  and  then  looking  at  the 
values  of  neighbouring  pixels.  Thus  spatial  correlations  occurring  in  natural  images 
are  taken  into  account.  To  create  a  good  image  decomposition  scheme  based  on  this 
approach,  the  image  is  split  into  a  low  resolution  part  consisting  of  a  smaller  number 
of  samples  than  the  original  image  and  a  difference  signal  which  is  the  difference 
between  the  low  resolution  part  and  the  actual  image.  The  low  resolution  part  is 
actually  a  good  estimate  of  five  true  image  due  to  the  correlations  present  in  real  world 
images.  The  image  it  generates  will  still  contain  spatial  correlations  and  hence  further 
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decomposition  can  take  place,  thereby  creating  a  hierarchical  decomposition  of  the 
original  image. 

An  efficient  decomposition  scheme  can  be  created  by  employing  multi-resolution 
wavelet  bases,  which  we  describe  here  only  briefly.  A  more  detailed  discussion 
appears  in  Refs.  [64,65].  We  let  (j)  be  the  generator  of  a  multiresolution  wavelet  basis,  so 
that  we  can  put. 


=  (C12) 

The  spaces  Vm=span{^m,n  Inez}  correspond  with  different  resolution  levels  of  our 
decomposition.  Then  there  is  a  function,  v[/,  such  that  the  orthogonal  space 
Wm=span{\\im,n  |  nez)  satisfies  the  direct  sum  given  by, 

(C13) 


where. 


Furthermore,  let  Pm  and  Qm  represent  the  orthogonal  projections  on  Vm  and  Wm 
respectively  while  the  sequence  (c„)„ez  represents  the  signal  undergoing  compression. 
The  projections  Pn  and  Qn  are  defined  respectively  as. 


(C14) 


The  coefficients  are  the  inner  products  shown  below, 

cl{f)=<f,<l>„,k>  and  >• 


We  define  a  sequence  {Cn°)ne^  c„<’=c„  and  an  associated  hmction  by. 


n 


(CIS) 


Applying  a  multiresolution  analysis  to  /  means  that  /  can  be  put  equal  to  P1/+Q1/ 
where  the  first  term  is  the  low  resolution  representation  of  /  and  the  second  term 
represents  the  difference  signal.  Specifically, 
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=  and 

Q\f  = 


The  coefficients  c\  are  given  by, 

cl  =<P\f,<l>\k  >=<fJ\k  > 

'^^on^^lk  ^~^^C„h„_2k  > 
n  n 


(C16) 


(C17) 


where. 


h„  =2"*'^  \(l>{y'^<l>{x-n)  dx. 

The  coefficients  dk^  are  evaluated  in  a  similar  fashion.  By  repeating  the  procedure  N 
times  we  arrive  at  the  decomposition, 

f  =  QJ*Q2f+-+QJ*P>,f<  (CIS) 


where, 

Pnf  =  J^c‘^^nk  . 

k 

and,  Q„f  =  X  • 

k 

The  coefficients  c*"  and  d/t"  are  determined  from  the  following  recursive  relations. 


d=Hd-i  and  dj=Gdj'i , 


(C19) 


where, 

(Ha),  =J]K_2,a„  and  (Ga), 

n  n 

After  a  number  of  iterations,  the  original  image  sequence  c>  is  decomposed  into  the 
lowest  resolution  signal  and  the  difference  signals  of  ever  finer 

resolution.  The  above  analysis  is  a  one-dimensional  multiresolution  representation  and 
can  be  extended  to  two  diinensions  by  using  products  as  opposed  to  sums  in  the  above 
results.  The  reader  is  referred  to  pp.  86-87  of  Ref.  53  for  this  non-trivial  exercise. 

In  summary,  the  Discrete  Wavelet  Transform  (DWT)  in  one  dimension  produces  two 
output  sequences,  referred  to  as  "odd"  and  "even",  from  an  input  sequence.  These  can 
be  viewed  as  a  pair  of  convolution  functions  or  Finite  Impulse  Response  (FIR)  filters 
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[55].  Both  filters  create  an  output  stream  that  is  half  the  length  of  the  original  input.  In 
many  situations,  the  low-pass  filter  output  or  odd  output  contains  most  of  the 
information  content  of  the  original  signal  and  is  related  to  Eq.  (C6)  by, 

=  -  =  1.2 . -^-2,  (C20) 

where;  ca-j+i  are  the  wavelet  coefficients,  fj  is  the  input  function  of  block  size  N  and  ai 
are  the  odd  output  values.  For  the  Haar  wavelet  there  are  only  two  coefficients,  co=l 
and  ci=l  while  for  the  Daubechies  4-wavelet,  there  are  four  coefficients,  co=(l+V3)/ 4, 
ci=(3+V3)/4,  C2=(3-V3)/4  and  C3=(l-V3)/4.  In  general,  higher  order  wavelets,  i.e.  those 
with  more  nonzero  corefficients  tend  to  put  more  information  in  the  odd  output  and 
less  into  the  even  output. 

The  high-pass  filter  output  or  even  output  contains  the  difference  between  the  true 
output  and  the  value  of  the  reconstructed  input  if  it  were  to  be  reconstructed  from  only 
the  information  given  in  the  odd  output.  The  even  output  values  h  can  also  be 
expressed  in  terms  of  wavelet  coefficients  as, 

■i  =  1.2....,Af/2.  (C21) 

An  important  step  in  wavelet  data  compression  is  determining  those  wavelet  fxmctions 
which  result  in  the  even  terms  being  almost  zero  [53].  In  fact,  if  the  average  amplitude 
of  the  even  output  is  sufficiently  low,  then  the  even  half  of  the  signal  can  be  discarded 
without  significant  degradation  occurring  in  the  reconstructed  signal.  Since  most  of 
the  information  is  held  in  the  low-pass  filter  output,  this  can  again  be  transformed  into 
two  new  sets  of  data.  If  the  number  of  input  samples  is  N=2d,  then  a  maximtun  of  D 
dilations  can  be  performed  with  the  last  dilation  resulting  in  a  single  low-pass  value 
and  a  single  high-pass  value.  Thus  successive  dilations  represent  lower  and  lower 
frequency  content  by  halves.  In  addition,  to  obtain  high  compression  rates,  it  may  be 
necessary  to  begin  with  large  blocks  of  input  so  that  not  only  more  dilations  can  be 
carried  out,  but  also  lower  frequencies  can  be  represented  in  the  decomposition. 

Basically  three  parameters  are  required  in  implementing  a  wavelet  coding  scheme; 

(i)  the  filter  length  reflecting  the  nmnber  of  coefficients  that  describe  the 

wavelet  fimction, 

(ii)  the  block  size  N  of  the  input,  which  must  be  a  power  of  two,  and 

(iii)  the  number  of  dilations  or  passes  of  the  input  stream. 
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For  N=2D,  D  dilations  are  possible  for  fuU  decomposition,  but  this  is  not  always 
suitable  in  compressing  data  [66], 

After  the  trarisformation  steps  are  completed  quantisation  is  usually  performed.  A 
finite  number  of  real-valued  coefficients  is  selected  to  form  a  quantisation  grid.  Each 
coefficient  is  then  replaced  by  die  nearest  point  in  the  grid.  The  grid  can  be  chosen  by 
taking  evenly  spaced  points  or  by  choosing  points  closer  together  near  zero.  The 
quantised  coefficients  are  then  coded  [53,55]. 

An  exact  reconstruction  of  the  image  can  only  be  made  if  the  decomposition 
coefficients  are  known  exactly,  which  is  not  possible  since  they  are  not  integers.  Thus, 
quantisation  of  the  coefficients  is  required,  similarly  as  for  the  DCT  discussed  in 
Appendix  A.  In  the  one  dimensional  case  die  original  image  can  be  reconstructed  by 
repeated  use  of  the  relation. 


Pj-J=Pjf+Qjf 

n  n 

This  implies  that, 

cr’  =<pj-jjj-u  > 

=  <¥jk,<l>j-l,n  > 

k  k 

=  'Yj‘'kK-2k+Yu‘^kSn-2k- 
k  k 

Reconstruction  of  the  original  image  in  either  one  or  two  dimensions  relies  on  the 
correct  choice  of  the  function  4). 

By  implementing  an  equally  spaced  quantisation  scheme  Nacken  (p.  81  of  Ref.  [53]) 
has  achieved  bit  rates  as  low  as  0.4  bits  per  pixel  (95%  data  reduction)  whilst  at  the 
same  time  maintaining  a  high  quality  of  reconstructed  image.  Fine  details  such  as 
small  bright  spots  are  preserved  better  with  this  approach  due  to  the  localisation  of  the 
low  level  basis  functions. 


(C22) 


(C23) 
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